1
|
Zhang L, Yi C, Xia X, Jiang Z, Du L, Yang S, Yang X. Solanum aculeatissimum and Solanum torvum chloroplast genome sequences: a comparative analysis with other Solanum chloroplast genomes. BMC Genomics 2024; 25:412. [PMID: 38671394 PMCID: PMC11046870 DOI: 10.1186/s12864-024-10190-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/05/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND Solanum aculeatissimum and Solanum torvum belong to the Solanum species, and they are essential plants known for their high resistance to diseases and adverse conditions. They are frequently used as rootstocks for grafting and are often crossbred with other Solanum species to leverage their resistance traits. However, the phylogenetic relationship between S. aculeatissimum and S. torvum within the Solanum genus remains unclear. Therefore, this paper aims to sequence the complete chloroplast genomes of S. aculeatissimum and S. torvum and analyze them in comparison with 29 other previously published chloroplast genomes of Solanum species. RESULTS We observed that the chloroplast genomes of S. aculeatissimum and S. torvum possess typical tetrameric structures, consisting of one Large Single Copy (LSC) region, two reverse-symmetric Inverted Repeats (IRs), and one Small Single Copy (SSC) region. The total length of these chloroplast genomes ranged from 154,942 to 156,004 bp, with minimal variation. The highest GC content was found in the IR region, while the lowest was in the SSC region. Regarding gene content, the total number of chloroplast genes and CDS genes remained relatively consistent, ranging from 128 to 134 and 83 to 91, respectively. Nevertheless, there was notable variability in the number of tRNA genes and rRNAs. Relative synonymous codon usage (RSCU) analysis revealed that both S. aculeatissimum and S. torvum preferred codons that utilized A and U bases. Analysis of the IR boundary regions indicated that contraction and expansion primarily occurred at the junction between SSC and IR regions. Nucleotide polymorphism analysis and structural variation analysis demonstrated that chloroplast variation in Solanum species mainly occurred in the LSC and SSC regions. Repeat sequence analysis revealed that A/T was the most frequent base pair in simple repeat sequences (SSR), while Palindromic and Forward repeats were more common in long sequence repeats (LSR), with Reverse and Complement repeats being less frequent. Phylogenetic analysis indicated that S. aculeatissimum and S. torvum belonged to the same meristem and were more closely related to Cultivated Eggplant. CONCLUSION These findings enhance our comprehension of chloroplast genomes within the Solanum genus, offering valuable insights for plant classification, evolutionary studies, and potential molecular markers for species identification.
Collapse
Affiliation(s)
- Longhao Zhang
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Chengqi Yi
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Xin Xia
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Zheng Jiang
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Lihui Du
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Shixin Yang
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China
| | - Xu Yang
- College of Horticulture and Landscape Architecture, Yangzhou University, 225009, Yangzhou, China.
| |
Collapse
|
2
|
Bai Q, Shi L, Li K, Xu F, Zhang W. The Construction of lncRNA/circRNA-miRNA-mRNA Networks Reveals Functional Genes Related to Growth Traits in Schima superba. Int J Mol Sci 2024; 25:2171. [PMID: 38396847 PMCID: PMC10888550 DOI: 10.3390/ijms25042171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/05/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024] Open
Abstract
Schima superba is a precious timber and fire-resistant tree species widely distributed in southern China. Currently, there is little knowledge related to its growth traits, especially with respect to molecular breeding. The lack of relevant information has delayed the development of modern breeding. The purpose is to identify probable functional genes involved in S. superba growth through whole transcriptome sequencing. In this study, a total of 32,711 mRNAs, 525 miRNAs, 54,312 lncRNAs, and 1522 circRNAs were identified from 10 S. superba individuals containing different volumes of wood. Four possible regulators, comprising three lncRNAs, one circRNA, and eleven key miRNAs, were identified from the regulatory networks of lncRNA-miRNA-mRNA and circRNA-miRNA-mRNA to supply information on ncRNAs. Several candidate genes involved in phenylpropane and cellulose biosynthesis pathways, including Ss4CL2, SsCSL1, and SsCSL2, and transcription factors, including SsDELLA2 (SsSLR), SsDELLA3 (SsSLN), SsDELLA5 (SsGAI-like2), and SsNAM1, were identified to reveal the molecular regulatory mechanisms regulating the growth traits of S. superba. The results not merely provide candidate functional genes related to S. superba growth trait and will be useful to carry out molecular breeding, but the strategy and method also provide scientists with an effective approach to revealing mechanisms behind important economic traits in other species.
Collapse
Affiliation(s)
- Qingsong Bai
- Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization, Guangdong Academy of Forestry, Guangzhou 510520, China
| | | | | | | | | |
Collapse
|
3
|
Mudaki P, Wamalwa LN, Muui CW, Nzuve F, Muasya RM, Nguluu S, Kimani W. Genetic Diversity and Population Structure of Sorghum (Sorghum bicolor (L.) Moench) Landraces Using DArTseq-Derived Single-Nucleotide Polymorphism (SNP) Markers. J Mol Evol 2023:10.1007/s00239-023-10108-1. [PMID: 37147402 DOI: 10.1007/s00239-023-10108-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 04/02/2023] [Indexed: 05/07/2023]
Abstract
Genetic integrity of an accession should be preserved in the conservation of germplasm. Characterization of diverse germplasm based on a molecular basis enhances its conservation and use in breeding programs. The aim of this study was to assess the genetic diversity of 169 sorghum accessions using a total of 6977 SNP markers. The polymorphic information content of the markers was 0.31 which is considered to be moderately high. Structure analysis using ADMIXTURE program revealed a total of 10 subpopulations. Neighbor-joining tree revealed the presence of six main clusters among these subpopulations whereas in principal component analysis, seven clusters were identified. Cluster analysis grouped most populations depending on source of collection although other accessions originating from the same source were grouped under different clusters. Analysis of molecular variance (AMOVA) revealed 30% and 70% of the variation occurred within and among accessions, respectively. Gene flow within the populations was, however, limited indicating high differentiation within the subpopulation. Observed heterozygosity among accessions varied from 0.03 to 0.06 with a mean of 0.05 since sorghum is a self-pollinating crop. High genetic diversity among the subpopulations can be further explored for superior genes to develop new sorghum varieties.
Collapse
Affiliation(s)
- Phoebe Mudaki
- Department of Plant Science and Crop Protection, University of Nairobi, Nairobi, Kenya
| | - Lydia N Wamalwa
- Department of Plant Science and Crop Protection, University of Nairobi, Nairobi, Kenya
| | - Catherine W Muui
- Department of Agricultural Science and Technology, Kenyatta University, Nairobi, Kenya
| | - Felister Nzuve
- Department of Plant Science and Crop Protection, University of Nairobi, Nairobi, Kenya
| | | | - Simon Nguluu
- South Eastern Kenya University (SEKU), Kitui, Kenya
| | - Wilson Kimani
- International Livestock Research Institute (ILRI), Nairobi, Kenya.
| |
Collapse
|
4
|
Patton DL, Cardenas T, Mele P, Navarro J, Sung W. CDMAP/CDVIS: context-dependent mutation analysis package and visualization software. G3 (BETHESDA, MD.) 2022; 13:6887836. [PMID: 36917690 PMCID: PMC10085751 DOI: 10.1093/g3journal/jkac299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 10/17/2022] [Indexed: 12/15/2022]
Abstract
The Context-dependent Mutation Analysis Package and Visualization Software (CDMAP/CDVIS) is an automated, modular toolkit used for the analysis and visualization of context-dependent mutation patterns (site-specific variation in mutation rate from neighboring-nucleotide effects). The CDMAP computes context-dependent mutation rates using a Variant Call File (VCF), Genbank file, and reference genome and can generate high-resolution figures to analyze variation in mutation rate across spatiotemporal scales. This algorithm has been benchmarked against mutation accumulation data but can also be used to calculate context-dependent mutation rates for polymorphism or closely related species as long as the input requirements are met. Output from CDMAP can be integrated into CDVIS, an interactive database for visualizing mutation patterns across multiple taxa simultaneously.
Collapse
Affiliation(s)
- David L Patton
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| | - Thomas Cardenas
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| | - Perrin Mele
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| | - Jon Navarro
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| | - Way Sung
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC, 28223, USA
| |
Collapse
|
5
|
Morton BR. Substitution rate heterogeneity across hexanucleotide contexts in noncoding chloroplast DNA. G3 GENES|GENOMES|GENETICS 2022; 12:6608088. [PMID: 35699494 PMCID: PMC9339276 DOI: 10.1093/g3journal/jkac150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022]
Abstract
Substitutions between closely related noncoding chloroplast DNA sequences are studied with respect to the composition of the 3 bases on each side of the substitution, that is the hexanucleotide context. There is about 100-fold variation in rate, among the contexts, particularly on substitutions of A and T. Rate heterogeneity of transitions differs from that of transversions, resulting in a more than 200-fold variation in the transitions: transversion bias. The data are consistent with a CpG effect, and it is shown that both the A + T content and the arrangement of purines/pyrimidines along the same DNA strand are correlated with rate variation. Expected equilibrium A + T content ranges from 36.4% to 82.8% across contexts, while G–C skew ranges from −77.4 to 72.2 and A–T skew ranges from −63.9 to 68.2. The predicted equilibria are associated with specific features of the content of the hexanucleotide context, and also show close agreement with the observed context-dependent compositions. Finally, by controlling for the content of nucleotides closer to the substitution site, it is shown that both the third and fourth nucleotide removed on each side of the substitution directly influence substitution dynamics at that site. Overall, the results demonstrate that noncoding sites in different contexts are evolving along very different evolutionary trajectories and that substitution dynamics are far more complex than typically assumed. This has important implications for a number of types of sequence analysis, particularly analyses of natural selection, and the context-dependent substitution matrices developed here can be applied in future analyses.
Collapse
Affiliation(s)
- Brian R Morton
- Department of Biology, Barnard College, Columbia University , New York, NY 10027, USA
| |
Collapse
|
6
|
Bai Q, He B, Cai Y, Lian H, Zhang Q, Liang D, Wang Y. Genetic Diversity and Population Structure of Schima superba From Southern China. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.879512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The tree Schima superba is important for afforestation and fire prevention in southern China. The wood of this tree can also be used for furniture and buildings. However, the lack of genetic background and genomic information for this species has lowered wood yield speed and quality improvement. Here, we aimed to discover genome-wide single nucleotide polymorphisms (SNPs) in 302 S. superba germplasms collected from southern China and to use these SNPs to investigate the population structure. Using genotyping by sequencing, a total of 785 high-quality SNP markers (minor allele frequency [MAF] ≥ 0.05) were identified from 302 accessions collected from seven geographical locations. Population structure analyses and principal coordinate analyses (PCoAs) indicated that these germplasm resources can be clearly separated into different populations. The S. superba accessions originating from Yunnan (YN) and Guangxi (GX) fell into the same population, separate from the accessions originating from Guangdong (GD), which indicated that these two regions should be regarded as major provenances of this species. In addition, two independent core germplasm sets with abundant genetic polymorphisms were constructed to support the breeding work. The identification of SNP markers, analyses of population genetics, and construction of core germplasm sets will greatly promote the molecular breeding work of S. superba.
Collapse
|
7
|
Wyant SR, Rodriguez MF, Carter CK, Parrott WA, Jackson SA, Stupar RM, Morrell PL. Fast neutron mutagenesis in soybean enriches for small indels and creates frameshift mutations. G3 (BETHESDA, MD.) 2022; 12:jkab431. [PMID: 35100358 PMCID: PMC9335934 DOI: 10.1093/g3journal/jkab431] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 11/14/2021] [Indexed: 11/13/2022]
Abstract
The mutagenic effects of ionizing radiation have been used for decades to create novel variants in experimental populations. Fast neutron (FN) bombardment as a mutagen has been especially widespread in plants, with extensive reports describing the induction of large structural variants, i.e., deletions, insertions, inversions, and translocations. However, the full spectrum of FN-induced mutations is poorly understood. We contrast small insertions and deletions (indels) observed in 27 soybean lines subject to FN irradiation with the standing indels identified in 107 diverse soybean lines. We use the same populations to contrast the nature and context (bases flanking a nucleotide change) of single-nucleotide variants. The accumulation of new single-nucleotide changes in FN lines is marginally higher than expected based on spontaneous mutation. In FN-treated lines and in standing variation, C→T transitions and the corresponding reverse complement G→A transitions are the most abundant and occur most frequently in a CpG local context. These data indicate that most SNPs identified in FN lines are likely derived from spontaneous de novo processes in generations following mutagenesis rather than from the FN irradiation mutagen. However, small indels in FN lines differ from standing variants. Short insertions, from 1 to 6 bp, are less abundant than in standing variation. Short deletions are more abundant and prone to induce frameshift mutations that should disrupt the structure and function of encoded proteins. These findings indicate that FN irradiation generates numerous small indels, increasing the abundance of loss-of-function mutations that impact single genes.
Collapse
Affiliation(s)
- Skylar R Wyant
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | - M Fernanda Rodriguez
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Corey K Carter
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Wayne A Parrott
- Department of Crop and Soil Sciences, University of Georgia, Athens, GA 30602, USA
| | - Scott A Jackson
- Department of Crop and Soil Sciences, University of Georgia, Athens, GA 30602, USA
| | - Robert M Stupar
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Peter L Morrell
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
8
|
Context-Dependent Substitution Dynamics in Plastid DNA Across a Wide Range of Taxonomic Groups. J Mol Evol 2022; 90:44-55. [PMID: 35037071 DOI: 10.1007/s00239-021-10040-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 12/01/2021] [Indexed: 10/19/2022]
Abstract
The influence of neighboring base composition, or context, on substitution bias at fourfold degenerate coding sites and in intergenic regions in plastid DNA is compared across the angiosperms, gymnosperms, ferns, liverworts, chlorophytes, stramenopiles and rhodophytes. An influence of flanking base G + C content on the relative rates of transitions and transversions is observed in all lineages and extends up to four nucleotides from the site of substitution in some. Despite finding context effects in all lineages, significant differences were observed between lineages. Overall, the data suggest that context is a general factor affecting mutation bias in plastid DNA but that the dynamics of the influence have evolved over time. It is also shown that, although there are similar effects of context on substitution bias at fourfold degenerate coding sites and at sites within intergenic regions, there are also small but significant differences, suggesting that there could be some selection on some of these sites and that there could be some difference in the mutation and/or repair process between coding and noncoding DNA.
Collapse
|
9
|
Genetic Diversity and Population Structure Analysis of the USDA Olive Germplasm Using Genotyping-By-Sequencing (GBS). Genes (Basel) 2021; 12:genes12122007. [PMID: 34946959 PMCID: PMC8701156 DOI: 10.3390/genes12122007] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 12/10/2021] [Accepted: 12/14/2021] [Indexed: 12/20/2022] Open
Abstract
Olives are one of the most important fruit and woody oil trees cultivated in many parts of the world. Olive oil is a critical component of the Mediterranean diet due to its importance in heart health. Olives are believed to have been brought to the United States from the Mediterranean countries in the 18th century. Despite the increase in demand and production areas, only a few selected olive varieties are grown in most traditional or new growing regions in the US. By understanding the genetic background, new sources of genetic diversity can be incorporated into the olive breeding programs to develop regionally adapted varieties for the US market. This study aimed to explore the genetic diversity and population structure of 90 olive accessions from the USDA repository along with six popular varieties using genotyping-by-sequencing (GBS)-generated SNP markers. After quality filtering, 54,075 SNP markers were retained for the genetic diversity analysis. The average gene diversity (GD) and polymorphic information content (PIC) values of the SNPs were 0.244 and 0.206, respectively, indicating a moderate genetic diversity for the US olive germplasm evaluated in this study. The structure analysis showed that the USDA collection was distributed across seven subpopulations; 63% of the accessions were grouped into an identifiable subpopulation. The phylogenetic and principal coordinate analysis (PCoA) showed that the subpopulations did not align with the geographical origins or climatic zones. An analysis of the molecular variance revealed that the major genetic variation sources were within populations. These findings provide critical information for future olive breeding programs to select genetically distant parents and facilitate future gene identification using genome-wide association studies (GWAS) or a marker-assisted selection (MAS) to develop varieties suited to production in the US.
Collapse
|
10
|
Ibrahim Bio Yerima AR, Issoufou KA, Adje CA, Mamadou A, Oselebe H, Gueye MC, Billot C, Achigan-Dako EG. Genome-Wide Scanning Enabled SNP Discovery, Linkage Disequilibrium Patterns and Population Structure in a Panel of Fonio (Digitaria exilis [Kippist] Stapf) Germplasm. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2021. [DOI: 10.3389/fsufs.2021.699549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
White fonio (Digitaria exilis) is a staple food for millions of people in arid and semi-arid areas of West Africa. Knowledge about nutritional and health benefits, insights into morphological diversity, and the recent development of genomic resources call for a better understanding of the genetic structure of the extant germplasm gathered throughout the region in order to set up a robust breeding program. We assessed the genetic diversity and population structure of 259 fonio individuals collected from six countries from West Africa (Nigeria, Benin, Guinea, Mali, Burkina Faso and Niger) in this study using 688 putative out of 21,324 DArTseq-derived SNP markers. Due to the inbreeding and small population size, the results revealed a substantial level of genetic variability. Furthermore, two clusters were found irrespective of the geographic origins of accessions. Moreover, the high level of linkage disequilibrium (LD) between loci observed resulted from the mating system of the crop, which is often associated with a low recombination rate. These findings fill the gaps about the molecular diversity and genetic structure of the white fonio germplasm in West Africa. This was required for the application of genomic tools that can potentially speed up the genetic gain in fonio millet breeding for complex traits such as yield, and other nutrient contents.
Collapse
|
11
|
Elsayed WM, Elmogy M, El-Desouky BS. DNA sequence reconstruction based on innovated hybridization technique of probabilistic cellular automata and particle swarm optimization. Inf Sci (N Y) 2020; 547:828-840. [PMID: 32895580 PMCID: PMC7467128 DOI: 10.1016/j.ins.2020.08.102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/24/2020] [Accepted: 08/27/2020] [Indexed: 11/24/2022]
Abstract
DNA sequence reconstruction is a challenging research problem in the computational biology field. The evolution of the DNA is too complex to be characterized by a few parameters. Therefore, there is a need for a modeling approach for analyzing DNA patterns. In this paper, we proposed a novel framework for DNA pattern analysis. The proposed framework consists of two main stages. The first stage is for analyzing the DNA sequences evolution, whereas the other stage is for the reconstruction process. We utilized cellular automata (CA) rules for analyzing and predicting the DNA sequence. Then, a modified procedure for the reconstruction process is introduced, which is based on the Probabilistic Cellular Automata (PCA) integrated with Particle Swarm Optimization (PSO) algorithm. This integration makes the proposed framework more efficient and achieves optimum transition rules. Our innovated model leans on the hypothesis that mutations are probabilistic events. As a result, their evolution can be simulated as a PCA model. The main objective of this paper is to analyze various DNA sequences to predict the changes that occur in DNA during evolution (mutations). We used a similarity score as a fitness measure to detect symmetry relations, which is appropriate for numerous extremely long sequences. Results are given for the CpG-methylation-deamination processes, which are regions of DNA where a guanine nucleotide follows a cytosine nucleotide in the linear sequence of bases. The DNA evolution is handled as the evolved colored paradigms. Therefore, incorporating probabilistic components help to produce a tool capable of foretelling the likelihood of specific mutations. Besides, it shows their capabilities in dealing with complex relations.
Collapse
Affiliation(s)
- Wesam M Elsayed
- Mathematics Dept., Faculty of Science, Mansoura University, Mansoura, Egypt
| | - Mohammed Elmogy
- Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - B S El-Desouky
- Mathematics Dept., Faculty of Science, Mansoura University, Mansoura, Egypt
| |
Collapse
|
12
|
Luo Z, Brock J, Dyer JM, Kutchan T, Schachtman D, Augustin M, Ge Y, Fahlgren N, Abdel-Haleem H. Genetic Diversity and Population Structure of a Camelina sativa Spring Panel. FRONTIERS IN PLANT SCIENCE 2019; 10:184. [PMID: 30842785 PMCID: PMC6391347 DOI: 10.3389/fpls.2019.00184] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 02/05/2019] [Indexed: 05/20/2023]
Abstract
There is a need to explore renewable alternatives (e.g., biofuels) that can produce energy sources to help reduce the reliance on fossil oils. In addition, the consumption of fossil oils adversely affects the environment and human health via the generation of waste water, greenhouse gases, and waste solids. Camelina sativa, originated from southeastern Europe and southwestern Asia, is being re-embraced as an industrial oilseed crop due to its high seed oil content (36-47%) and high unsaturated fatty acid composition (>90%), which are suitable for jet fuel, biodiesel, high-value lubricants and animal feed. C. sativa's agronomic advantages include short time to maturation, low water and nutrient requirements, adaptability to adverse environmental conditions and resistance to common pests and pathogens. These characteristics make it an ideal crop for sustainable agricultural systems and regions of marginal land. However, the lack of genetic and genomic resources has slowed the enhancement of this emerging oilseed crop and exploration of its full agronomic and breeding potential. Here, a core of 213 spring C. sativa accessions was collected and genotyped. The genotypic data was used to characterize genetic diversity and population structure to infer how natural selection and plant breeding may have affected the formation and differentiation within the C. sativa natural populations, and how the genetic diversity of this species can be used in future breeding efforts. A total of 6,192 high-quality single nucleotide polymorphisms (SNPs) were identified using genotyping-by-sequencing (GBS) technology. The average polymorphism information content (PIC) value of 0.29 indicate moderate genetic diversity for the C. sativa spring panel evaluated in this report. Population structure and principal coordinates analyses (PCoA) based on SNPs revealed two distinct subpopulations. Sub-population 1 (POP1) contains accessions that mainly originated from Germany while the majority of POP2 accessions (>75%) were collected from Eastern Europe. Analysis of molecular variance (AMOVA) identified 4% variance among and 96% variance within subpopulations, indicating a high gene exchange (or low genetic differentiation) between the two subpopulations. These findings provide important information for future allele/gene identification using genome-wide association studies (GWAS) and marker-assisted selection (MAS) to enhance genetic gain in C. sativa breeding programs.
Collapse
Affiliation(s)
- Zinan Luo
- U.S. Arid Land Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Maricopa, AZ, United States
- *Correspondence: Zinan Luo, Hussein Abdel-Haleem,
| | - Jordan Brock
- Department of Biology, Washington University in St. Louis, St. Louis, MO, United States
| | - John M. Dyer
- U.S. Arid Land Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Maricopa, AZ, United States
| | - Toni Kutchan
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Daniel Schachtman
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, United States
| | - Megan Augustin
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Yufeng Ge
- Department of Biological and Agricultural Engineering, University of Nebraska, Lincoln, NE, United States
| | - Noah Fahlgren
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Hussein Abdel-Haleem
- U.S. Arid Land Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Maricopa, AZ, United States
- *Correspondence: Zinan Luo, Hussein Abdel-Haleem,
| |
Collapse
|
13
|
Shea DJ, Shimizu M, Itabashi E, Miyaji N, Miyazaki J, Osabe K, Kaji M, Okazaki K, Fujimoto R. Genome re-sequencing, SNP analysis, and genetic mapping of the parental lines of a commercial F 1 hybrid cultivar of Chinese cabbage. BREEDING SCIENCE 2018; 68:375-380. [PMID: 30100805 PMCID: PMC6081294 DOI: 10.1270/jsbbs.17124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 02/04/2018] [Indexed: 06/08/2023]
Abstract
The genome-wide characterization of single nucleotide polymorphism (SNP) between cultivars or between inbred lines contributes to the creation of genetic markers that are important for plant breeding. Functional markers derived from polymorphisms within genes that affect phenotypic variation are especially valuable in plant breeding. Here, we report on the genome re-sequencing and analysis of the two parental inbred lines of the commercial F1 hybrid Chinese cabbage cultivar "W77". Through the genome-wide identification and classification of the SNPs and indels present in each parental line, we identified about 1,500 putative non-functional genes in each parent. We designed cleaved amplified polymorphic sequence (CAPS) markers using specific mutations found at Eco RI restriction sites in the parental lines and confirmed their Mendelian segregation by constructing a linkage map using 96 F2 plants derived from the F1 hybrid cultivar, "W77". Our results and data will be a useful genomic resource for future studies of gene function and metagenomic studies in Chinese cabbage.
Collapse
Affiliation(s)
- Daniel J. Shea
- Graduate School of Science and Technology, Niigata University,
Ikarashi-ninocho, Niigata 950-2181,
Japan
| | - Motoki Shimizu
- Iwate Biotechnology Research Center,
Narita, Kitakami, Iwate 024-0003,
Japan
| | - Etsuko Itabashi
- Institute of Vegetable and Floriculture Science, NARO,
Kusawa, Ano, Tsu, Mie 514-2392,
Japan
| | - Naomi Miyaji
- Graduate School of Agricultural Science, Kobe University,
Rokkodai, Nada-ku, Kobe, Hyogo 657-8501,
Japan
| | - Junji Miyazaki
- Centre for AgriBioscience, Department of Animal, Plant and Soil Sciences, La Trobe University,
Melbourne VICAustralia
| | - Kenji Osabe
- Plant Epigenetics Unit, Okinawa Institute of Science and Technology Graduate University,
Onna-son, Okinawa 904-0495,
Japan
| | - Makoto Kaji
- Watanabe Seed Co., Ltd.,
Machiyashiki, Misato-cho, Miyagi 987-0003,
Japan
| | - Keiichi Okazaki
- Graduate School of Science and Technology, Niigata University,
Ikarashi-ninocho, Niigata 950-2181,
Japan
| | - Ryo Fujimoto
- Graduate School of Agricultural Science, Kobe University,
Rokkodai, Nada-ku, Kobe, Hyogo 657-8501,
Japan
| |
Collapse
|
14
|
Paul P, Malakar AK, Chakraborty S. Codon usage vis-a-vis start and stop codon context analysis of three dicot species. J Genet 2018; 97:97-107. [PMID: 29666329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To understand the variation in genomic composition and its effect on codon usage, we performed the comparative analysis of codon usage and nucleotide usage in the genes of three dicots, Glycine max, Arabidopsis thaliana and Medicago truncatula. The dicot genes were found to be A/T rich and have predominantly A-ending and/or T-ending codons. GC3s directly mimic theusage pattern of global GC content. Relative synonymous codon usage analysis suggests that the high usage frequency of A/T over G/C mononucleotide containing codons in AT-rich dicot genome is due to compositional constraint as a factor of codon usage bias. Odds ratio analysis identified the dinucleotides TpG, TpC, GpA, CpA and CpT as over-represented, where, CpG and TpA as under-represented dinucleotides. The results of (NcExp-NcObs)/NcExp plot suggests that selection pressure other than mutation played a significant role in influencing the pattern of codon usage in these dicots. PR2 analysis revealed the significant role of selection pressure on codon usage. Analysis of varience on codon usage at start and stop site showed variation in codon selection in these sites. This study provides evidence that the dicot genes were subjected to compositional selection pressure.
Collapse
Affiliation(s)
- Prosenjit Paul
- Department of Biotechnology, Assam University, Silchar 788 011, India.
| | | | | |
Collapse
|
15
|
Paul P, Malakar AK, Chakraborty S. Codon usage vis-a-vis start and stop codon context analysis of three dicot species. J Genet 2018. [DOI: 10.1007/s12041-018-0892-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
16
|
Paul P, Malakar AK, Chakraborty S. Compositional bias coupled with selection and mutation pressure drives codon usage in Brassica campestris genes. Food Sci Biotechnol 2017; 27:725-733. [PMID: 30263798 DOI: 10.1007/s10068-017-0285-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Revised: 11/28/2017] [Accepted: 12/03/2017] [Indexed: 11/25/2022] Open
Abstract
The plant Brassica campestris includes the vegetables turnip and Chinese cabbage, important plants of economic importance. Here, we have analysed the codon usage bias of B. campestris for 116 protein coding genes. Neutrality analysis showed that B. campestris had a wide range of GC3s, and a significant correlation was observed between GC12 and GC3. Nc versus GC3s plot showed a few genes on or proximate to the expected curve, but the majority of points were found to be scattered distantly from the expected curve. Correspondence analysis on codon usage revealed that the position preference of codons on multidimensional space totally depends on the presence of A and T at synonymous third codon position. These results altogether suggest that composition bias along with selection (major) and mutation pressure (minor) affects the codon usage pattern of the protein coding genes in Brassica campestris.
Collapse
Affiliation(s)
- Prosenjit Paul
- Department of Biotechnology, Assam University, Silchar, Assam 788011 India
| | - Arup Kumar Malakar
- Department of Biotechnology, Assam University, Silchar, Assam 788011 India
| | | |
Collapse
|
17
|
Contributions of Zea mays subspecies mexicana haplotypes to modern maize. Nat Commun 2017; 8:1874. [PMID: 29187731 PMCID: PMC5707364 DOI: 10.1038/s41467-017-02063-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 11/03/2017] [Indexed: 11/09/2022] Open
Abstract
Maize was domesticated from lowland teosinte (Zea mays ssp. parviglumis), but the contribution of highland teosinte (Zea mays ssp. mexicana, hereafter mexicana) to modern maize is not clear. Here, two genomes for Mo17 (a modern maize inbred) and mexicana are assembled using a meta-assembly strategy after sequencing of 10 lines derived from a maize-teosinte cross. Comparative analyses reveal a high level of diversity between Mo17, B73, and mexicana, including three Mb-size structural rearrangements. The maize spontaneous mutation rate is estimated to be 2.17 × 10-8 ~3.87 × 10-8 per site per generation with a nonrandom distribution across the genome. A higher deleterious mutation rate is observed in the pericentromeric regions, and might be caused by differences in recombination frequency. Over 10% of the maize genome shows evidence of introgression from the mexicana genome, suggesting that mexicana contributed to maize adaptation and improvement. Our data offer a rich resource for constructing the pan-genome of Zea mays and genetic improvement of modern maize varieties.
Collapse
|
18
|
Hoekstra PH, Wieringa JJ, Smets E, Brandão RD, Lopes JDC, Erkens RHJ, Chatrou LW. Correlated evolutionary rates across genomic compartments in Annonaceae. Mol Phylogenet Evol 2017; 114:63-72. [PMID: 28578201 DOI: 10.1016/j.ympev.2017.05.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Revised: 05/29/2017] [Accepted: 05/29/2017] [Indexed: 11/28/2022]
Abstract
The molecular clock hypothesis is an important concept in biology. Deviations from a constant rate of nucleotide substitution have been found widely among lineages, genomes, genes and individual sites. Phylogenetic research can accommodate for these differences in applying specific models of evolution. Lineage-specific rate heterogeneity however can generate bi- or multimodal distributions of substitution rates across the branches of a tree and this may mislead phylogenetic inferences with currently available models. The plant family Annonaceae is an excellent case to study lineage-specific rate heterogeneity. The two major sister subfamilies, Annonoideae and Malmeoideae, have shown great discrepancies in branch lengths. We used high-throughput sequencing data of 72 genes, 99 spacers and 16 introns from 24 chloroplast genomes and nuclear ribosomal DNA of 23 species to study the molecular rate of evolution in Annonaceae. In all analyses, longer branch lengths and/or higher substitution rates were found for the Annonoideae compared to the Malmeoideae. The Annonaceae had wide variability in chloroplast length, ranging from minimal 175,684bp to 201,723 for Annonoideae and minimal 152,357 to 170,985bp in Malmeoideae, mostly reflecting variation in inverted-repeat length. The Annonoideae showed a higher GC-content in the conserved parts of the chloroplast genome and higher omega (dN/dS)-ratios than the Malmeoideae, which could indicate less stringent purifying selection, a pattern that has been found in groups with small population sizes. This study generates new insights into the processes causing lineage-specific rate heterogeneity, which could lead to improved phylogenetic methods.
Collapse
Affiliation(s)
- Paul H Hoekstra
- Naturalis Biodiversity Center, National Herbarium of the Netherlands, Darwinweg 2, 2300 RA Leiden, The Netherlands; Wageningen University & Research, Biosystematics Group, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.
| | - Jan J Wieringa
- Naturalis Biodiversity Center, National Herbarium of the Netherlands, Darwinweg 2, 2300 RA Leiden, The Netherlands; Wageningen University & Research, Biosystematics Group, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.
| | - Erik Smets
- Naturalis Biodiversity Center, National Herbarium of the Netherlands, Darwinweg 2, 2300 RA Leiden, The Netherlands; Katholieke Universiteit Leuven, Ecology, Evolution and Biodiversity Conservation Section, Kasteelpark Arenberg 31, Box 2435, 3001 Leuven, Belgium.
| | - Rita D Brandão
- Maastricht University, Maastricht Science Programme, Kapoenstraat 2, 6211 KW Maastricht, The Netherlands.
| | - Jenifer de Carvalho Lopes
- Universidade de São Paulo, Instituto de Biociências, Departamento de Botânica, Rua do Matão 277, 05508-090 São Paulo, SP, Brazil.
| | - Roy H J Erkens
- Maastricht University, Maastricht Science Programme, Kapoenstraat 2, 6211 KW Maastricht, The Netherlands.
| | - Lars W Chatrou
- Wageningen University & Research, Biosystematics Group, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.
| |
Collapse
|
19
|
Waiho K, Fazhan H, Shahreza MS, Moh JHZ, Noorbaiduri S, Wong LL, Sinnasamy S, Ikhwanuddin M. Transcriptome Analysis and Differential Gene Expression on the Testis of Orange Mud Crab, Scylla olivacea, during Sexual Maturation. PLoS One 2017; 12:e0171095. [PMID: 28135340 PMCID: PMC5279790 DOI: 10.1371/journal.pone.0171095] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 01/15/2017] [Indexed: 01/04/2023] Open
Abstract
Adequate genetic information is essential for sustainable crustacean fisheries and aquaculture management. The commercially important orange mud crab, Scylla olivacea, is prevalent in Southeast Asia region and is highly sought after. Although it is a suitable aquaculture candidate, full domestication of this species is hampered by the lack of knowledge about the sexual maturation process and the molecular mechanisms behind it, especially in males. To date, data on its whole genome is yet to be reported for S. olivacea. The available transcriptome data published previously on this species focus primarily on females and the role of central nervous system in reproductive development. De novo transcriptome sequencing for the testes of S. olivacea from immature, maturing and mature stages were performed. A total of approximately 144 million high-quality reads were generated and de novo assembled into 160,569 transcripts with a total length of 142.2 Mb. Approximately 15–23% of the total assembled transcripts were annotated when compared to public protein sequence databases (i.e. UniProt database, Interpro database, Pfam database and Drosophila melanogaster protein database), and GO-categorised with GO Ontology terms. A total of 156,181 high-quality Single-Nucleotide Polymorphisms (SNPs) were mined from the transcriptome data of present study. Transcriptome comparison among the testes of different maturation stages revealed one gene (beta crystallin like gene) with the most significant differential expression—up-regulated in immature stage and down-regulated in maturing and mature stages. This was further validated by qRT-PCR. In conclusion, a comprehensive transcriptome of the testis of orange mud crabs from different maturation stages were obtained. This report provides an invaluable resource for enhancing our understanding of this species’ genome structure and biology, as expressed and controlled by their gonads.
Collapse
Affiliation(s)
- Khor Waiho
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
- * E-mail: (KW); (MI)
| | - Hanafiah Fazhan
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Md Sheriff Shahreza
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
- School of Fisheries and Aquaculture Sciences, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Julia Hwei Zhong Moh
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Shaibani Noorbaiduri
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Li Lian Wong
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Saranya Sinnasamy
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
| | - Mhd Ikhwanuddin
- Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Terengganu, Terengganu, Malaysia
- * E-mail: (KW); (MI)
| |
Collapse
|
20
|
Kusumi J, Tsumura Y, Tachida H. Evolutionary rate variation in two conifer species, Taxodium distichum (L.) Rich. var. distichum (baldcypress) and Cryptomeria japonica (Thunb. ex L.f.) D. Don (Sugi, Japanese cedar). Genes Genet Syst 2015; 90:305-15. [PMID: 26687861 DOI: 10.1266/ggs.14-00079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
With the advance of sequencing technologies, large-scale data of expressed sequence tags and full-length cDNA sequences have been reported for several conifer species. Comparative analyses of evolutionary rates among diverse taxa provide insights into taxon-specific molecular evolutionary features and into the origin of variation in evolutionary rates within genomes and between species. Here, we estimated evolutionary rates in two conifer species, Taxodium distichum and Cryptomeria japonica, to illuminate the molecular evolutionary features of these species, using hundreds of genes and employing Chamaecyparis obtusa as an outgroup. Our results show that the mutation rates based on synonymous substitution rates (dS) of T. distichum and C. japonica are approximately 0.67 × 10(-9) and 0.59 × 10(-9)/site/year, respectively, which are 15-25 times lower than those of annual angiosperms. We found a significant positive correlation between dS and GC3. This implies that a local mutation bias, such as context dependency of the mutation bias, exists within the genomes of T. distichum and C. japonica, and/or that selection acts on synonymous sites in these species. In addition, the means of the ratios of synonymous to nonsynonymous substitution rate in the two species are almost the same, suggesting that the average intensity of functional constraint is constant between the lineages. Finally, we tested the possibility of positive selection based on the site model, and detected one candidate gene for positive selection.
Collapse
Affiliation(s)
- Junko Kusumi
- Department of Environmental Changes, Faculty of Social and Cultural Studies, Kyushu University
| | | | | |
Collapse
|
21
|
Abstract
A pattern in which nucleotide transitions are favored several fold over transversions is common in molecular evolution. When this pattern occurs among amino acid replacements, explanations often invoke an effect of selection, on the grounds that transitions are more conservative in their effects on proteins. However, the underlying hypothesis of conservative transitions has never been tested directly. Here we assess support for this hypothesis using direct evidence: the fitness effects of mutations in actual proteins measured via individual or paired growth experiments. We assembled data from 8 published studies, ranging in size from 24 to 757 single-nucleotide mutations that change an amino acid. Every study has the statistical power to reveal significant effects of amino acid exchangeability, and most studies have the power to discern a binary conservative-vs-radical distinction. However, only one study suggests that transitions are significantly more conservative than transversions. In the combined set of 1,239 replacements (544 transitions, 695 transversions), the chance that a transition is more conservative than a transversion is 53 % (95 % confidence interval 50 to 56) compared with the null expectation of 50 %. We show that this effect is not large compared with that of most biochemical factors, and is not large enough to explain the several-fold bias observed in evolution. In short, the available data have the power to verify the “conservative transitions” hypothesis if true, but suggest instead that selection on proteins plays at best a minor role in the observed bias.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research, Rockville, MD Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD
| | - Ryan W Norris
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University
| |
Collapse
|
22
|
Doddamani D, Khan AW, Katta MAVSK, Agarwal G, Thudi M, Ruperao P, Edwards D, Varshney RK. CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav078. [PMID: 26289427 PMCID: PMC4541373 DOI: 10.1093/database/bav078] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 07/22/2015] [Indexed: 11/12/2022]
Abstract
Molecular markers are valuable tools for breeders to help accelerate crop improvement. High throughput sequencing technologies facilitate the discovery of large-scale variations such as single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). Sequencing of chickpea genome along with re-sequencing of several chickpea lines has enabled the discovery of 4.4 million variations including SNPs and InDels. Here we report a repository of 1.9 million variations (SNPs and InDels) anchored on eight pseudomolecules in a custom database, referred as CicArVarDB that can be accessed at http://cicarvardb.icrisat.org/. It includes an easy interface for users to select variations around specific regions associated with quantitative trait loci, with embedded webBLAST search and JBrowse visualisation. We hope that this database will be immensely useful for the chickpea research community for both advancing genetics research as well as breeding applications for crop improvement. Database URL:http://cicarvardb.icrisat.org.
Collapse
Affiliation(s)
- Dadakhalandar Doddamani
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India
| | - Aamir W Khan
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India
| | - Mohan A V S K Katta
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India
| | - Gaurav Agarwal
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India
| | - Mahendar Thudi
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India
| | - Pradeep Ruperao
- School of Agriculture and Food Sciences, University of Queensland, St Lucia, Queensland, Australia 4072, School of Plant Biology, The University of Western Australia, Perth, Western Australia, Australia 6009 and
| | - David Edwards
- School of Plant Biology, The University of Western Australia, Perth, Western Australia, Australia 6009 and Institute of Agriculture, The University of Western Australia, Perth, Western Australia, Australia 6009
| | - Rajeev K Varshney
- Research Program Grain Legumes, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, Telangana State, India, School of Plant Biology, The University of Western Australia, Perth, Western Australia, Australia 6009 and
| |
Collapse
|
23
|
Perlin MH, Amselem J, Fontanillas E, Toh SS, Chen Z, Goldberg J, Duplessis S, Henrissat B, Young S, Zeng Q, Aguileta G, Petit E, Badouin H, Andrews J, Razeeq D, Gabaldón T, Quesneville H, Giraud T, Hood ME, Schultz DJ, Cuomo CA. Sex and parasites: genomic and transcriptomic analysis of Microbotryum lychnidis-dioicae, the biotrophic and plant-castrating anther smut fungus. BMC Genomics 2015; 16:461. [PMID: 26076695 PMCID: PMC4469406 DOI: 10.1186/s12864-015-1660-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 05/28/2015] [Indexed: 12/11/2022] Open
Abstract
Background The genus Microbotryum includes plant pathogenic fungi afflicting a wide variety of hosts with anther smut disease. Microbotryum lychnidis-dioicae infects Silene latifolia and replaces host pollen with fungal spores, exhibiting biotrophy and necrosis associated with altering plant development. Results We determined the haploid genome sequence for M. lychnidis-dioicae and analyzed whole transcriptome data from plant infections and other stages of the fungal lifecycle, revealing the inventory and expression level of genes that facilitate pathogenic growth. Compared to related fungi, an expanded number of major facilitator superfamily transporters and secretory lipases were detected; lipase gene expression was found to be altered by exposure to lipid compounds, which signaled a switch to dikaryotic, pathogenic growth. In addition, while enzymes to digest cellulose, xylan, xyloglucan, and highly substituted forms of pectin were absent, along with depletion of peroxidases and superoxide dismutases that protect the fungus from oxidative stress, the repertoire of glycosyltransferases and of enzymes that could manipulate host development has expanded. A total of 14 % of the genome was categorized as repetitive sequences. Transposable elements have accumulated in mating-type chromosomal regions and were also associated across the genome with gene clusters of small secreted proteins, which may mediate host interactions. Conclusions The unique absence of enzyme classes for plant cell wall degradation and maintenance of enzymes that break down components of pollen tubes and flowers provides a striking example of biotrophic host adaptation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1660-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michael H Perlin
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY, 40292, USA.
| | - Joelle Amselem
- Institut National de la Recherche Agronomique (INRA), Unité de Recherche Génomique Info (URGI), Versailles, France. .,Institut National de la Recherche Agronomique (INRA), Biologie et gestion des risques en agriculture (BIOGER), Thiverval-Grignon, France.
| | - Eric Fontanillas
- Ecologie, Systématique et Evolution, Bâtiment 360, Université Paris-Sud, F-91405, Orsay, France. .,CNRS, F-91405, Orsay, France.
| | - Su San Toh
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY, 40292, USA.
| | - Zehua Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| | | | - Sebastien Duplessis
- INRA, UMR 1136, Interactions Arbres-Microorganismes, Champenoux, France. .,UMR 1136, Université de Lorraine, Interactions Arbres-Microorganismes, Vandoeuvre-lès-Nancy, France.
| | - Bernard Henrissat
- Centre National de la Recherche Scientifique (CNRS), UMR7257, Université Aix-Marseille, 13288, Marseille, France. .,Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia.
| | - Sarah Young
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| | - Qiandong Zeng
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| | | | - Elsa Petit
- Ecologie, Systématique et Evolution, Bâtiment 360, Université Paris-Sud, F-91405, Orsay, France. .,CNRS, F-91405, Orsay, France. .,Centre National de la Recherche Scientifique (CNRS), UMR7257, Université Aix-Marseille, 13288, Marseille, France.
| | - Helene Badouin
- Ecologie, Systématique et Evolution, Bâtiment 360, Université Paris-Sud, F-91405, Orsay, France. .,CNRS, F-91405, Orsay, France.
| | - Jared Andrews
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY, 40292, USA.
| | - Dominique Razeeq
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY, 40292, USA.
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,Institució Catalana d'Estudis Avançats (ICREA), Barcelona, Spain.
| | - Hadi Quesneville
- Institut National de la Recherche Agronomique (INRA), Unité de Recherche Génomique Info (URGI), Versailles, France.
| | - Tatiana Giraud
- Ecologie, Systématique et Evolution, Bâtiment 360, Université Paris-Sud, F-91405, Orsay, France. .,CNRS, F-91405, Orsay, France.
| | - Michael E Hood
- Department of Biology, Amherst College, Amherst, MA, 01002, USA.
| | - David J Schultz
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY, 40292, USA.
| | | |
Collapse
|
24
|
Vijayakumar P, Raut AA, Kumar P, Sharma D, Mishra A. De novo assembly and analysis of crow lungs transcriptome. Genome 2015; 57:499-506. [PMID: 25633965 DOI: 10.1139/gen-2014-0122] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The jungle crow (Corvus macrorhynchos) belongs to the order Passeriformes of bird species and is important for avian ecological and evolutionary genetics studies. However, there is limited information on the transcriptome data of this species. In the present study, we report the characterization of the lung transcriptome of the jungle crow using GS FLX Titanium XLR70. Altogether, 1,510,303 high-quality sequence reads with 581,198,230 bases was de novo assembled into 22,169 isotigs (isotig represents an individual transcript) and 784,009 singletons. Using these isotigs and 581,681 length-filtered (greater than 300 bp) singletons, 20,010 unique protein-coding genes were identified by BLASTx comparison against a nonredundant (nr) protein sequence database. Comparative analysis revealed that 46,604 (70.29%) and 51,642 (72.48%) of the assembled transcripts have significant similarity to zebra finch and chicken RefSeq proteins, respectively. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Transcripts putatively involved in the immune response were identified. Furthermore, 20,599 single nucleotide polymorphisms (SNPs) and 7525 simple sequence repeats (SSRs) were retrieved from the assembled transcript database. This resource should lay an important base for future ecological, evolutionary, and conservation genetic studies on this species and in other related species.
Collapse
Affiliation(s)
- Periyasamy Vijayakumar
- a High Security Animal Disease Laboratory, Indian Veterinary Research Institute, Anand Nagar, Bhopal-462021, Madhya Pradesh, India
| | | | | | | | | |
Collapse
|
25
|
Huang P, Feldman M, Schroder S, Bahri BA, Diao X, Zhi H, Estep M, Baxter I, Devos KM, Kellogg EA. Population genetics of Setaria viridis, a new model system. Mol Ecol 2014; 23:4912-25. [PMID: 25185718 DOI: 10.1111/mec.12907] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Revised: 08/27/2014] [Accepted: 08/29/2014] [Indexed: 02/03/2023]
Abstract
An extensive survey of the standing genetic variation in natural populations is among the priority steps in developing a species into a model system. In recent years, green foxtail (Setaria viridis), along with its domesticated form foxtail millet (S. italica), has rapidly become a promising new model system for C4 grasses and bioenergy crops, due to its rapid life cycle, large amount of seed production and small diploid genome, among other characters. However, remarkably little is known about the genetic diversity in natural populations of this species. In this study, we survey the genetic diversity of a worldwide sample of more than 200 S. viridis accessions, using the genotyping-by-sequencing technique. Two distinct genetic groups in S. viridis and a third group resembling S. italica were identified, with considerable admixture among the three groups. We find the genetic variation of North American S. viridis correlates with both geography and climate and is representative of the total genetic diversity in this species. This pattern may reflect several introduction/dispersal events of S. viridis into North America. We also modelled demographic history and show signal of recent population decline in one subgroup. Finally, we show linkage disequilibrium decay is rapid (<45 kb) in our total sample and slow in genetic subgroups. These results together provide an in-depth understanding of the pattern of genetic diversity of this new model species on a broad geographic scale. They also provide key guidelines for on-going and future work including germplasm preservation, local adaptation, crossing designs and genomewide association studies.
Collapse
Affiliation(s)
- Pu Huang
- Donald Danforth Plant Science Center, 975 North Warson Rd., St. Louis, MO, 63132, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Transcriptome analysis of the Portunus trituberculatus: de novo assembly, growth-related gene identification and marker discovery. PLoS One 2014; 9:e94055. [PMID: 24722690 PMCID: PMC3983128 DOI: 10.1371/journal.pone.0094055] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 03/11/2014] [Indexed: 11/19/2022] Open
Abstract
Background The swimming crab, Portunus trituberculatus, is an important farmed species in China, has been attracting extensive studies, which require more and more genome background knowledge. To date, the sequencing of its whole genome is unavailable and transcriptomic information is also scarce for this species. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for major tissues of Portunus trituberculatus by the Illumina paired-end sequencing technology. Results Total RNA was isolated from eyestalk, gill, heart, hepatopancreas and muscle. Equal quantities of RNA from each tissue were pooled to construct a cDNA library. Using the Illumina paired-end sequencing technology, we generated a total of 120,137 transcripts with an average length of 1037 bp. Further assembly analysis showed that all contigs contributed to 87,100 unigenes, of these, 16,029 unigenes (18.40% of the total) can be matched in the GenBank non-redundant database. Potential genes and their functions were predicted by GO, KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literature, many putative genes with fundamental roles in growth and muscle development, including actin, myosin, tropomyosin, troponin and other potentially important candidate genes were identified for the first time in this specie. Furthermore, 22,673 SSRs and 66,191 high-confidence SNPs were identified in this EST dataset. Conclusion The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in Portunus trituberculatus. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will be essential for accelerating aquaculture breeding programs with this species.
Collapse
|
27
|
Guo Y, Ye F, Sheng Q, Clark T, Samuels DC. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform 2013; 15:879-89. [PMID: 24067931 DOI: 10.1093/bib/bbt069] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Advances in next-generation sequencing (NGS) technologies have greatly improved our ability to detect genomic variants for biomedical research. In particular, NGS technologies have been recently applied with great success to the discovery of mutations associated with the growth of various tumours and in rare Mendelian diseases. The advance in NGS technologies has also created significant challenges in bioinformatics. One of the major challenges is quality control of the sequencing data. In this review, we discuss the proper quality control procedures and parameters for Illumina technology-based human DNA re-sequencing at three different stages of sequencing: raw data, alignment and variant calling. Monitoring quality control metrics at each of the three stages of NGS data provides unique and independent evaluations of data quality from differing perspectives. Properly conducting quality control protocols at all three stages and correctly interpreting the quality control results are crucial to ensure a successful and meaningful study.
Collapse
|
28
|
Dockter RB, Elzinga DB, Geary B, Maughan PJ, Johnson LA, Tumbleson D, Franke J, Dockter K, Stevens MR. Developing molecular tools and insights into the Penstemon genome using genomic reduction and next-generation sequencing. BMC Genet 2013; 14:66. [PMID: 23924218 PMCID: PMC3751293 DOI: 10.1186/1471-2156-14-66] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Accepted: 08/01/2013] [Indexed: 11/10/2022] Open
Abstract
Background Penstemon’s unique phenotypic diversity, hardiness, and drought-tolerance give it great potential for the xeric landscaping industry. Molecular markers will accelerate the breeding and domestication of drought tolerant Penstemon cultivars by, creating genetic maps, and clarifying of phylogenetic relationships. Our objectives were to identify and validate interspecific molecular markers from four diverse Penstemon species in order to gain specific insights into the Penstemon genome. Results We used a 454 pyrosequencing and GR-RSC (genome reduction using restriction site conservation) to identify homologous loci across four Penstemon species (P. cyananthus, P. davidsonii, P. dissectus, and P. fruticosus) representing three diverse subgenera with considerable genome size variation. From these genomic data, we identified 133 unique interspecific markers containing SSRs and INDELs of which 51 produced viable PCR-based markers. These markers produced simple banding patterns in 90% of the species × marker interactions (~84% were polymorphic). Twelve of the markers were tested across 93, mostly xeric, Penstemon taxa (72 species), of which ~98% produced reproducible marker data. Additionally, we identified an average of one SNP per 2,890 bp per species and one per 97 bp between any two apparent homologous sequences from the four source species. We selected 192 homologous sequences, meeting stringent parameters, to create SNP markers. Of these, 75 demonstrated repeatable polymorphic marker functionality across the four sequence source species. Finally, sequence analysis indicated that repetitive elements were approximately 70% more prevalent in the P. cyananthus genome, the largest genome in the study, than in the smallest genome surveyed (P. dissectus). Conclusions We demonstrated the utility of GR-RSC to identify homologous loci across related Penstemon taxa. Though PCR primer regions were conserved across a broadly sampled survey of Penstemon species (93 taxa), DNA sequence within these amplicons (12 SSR/INDEL markers) was highly diverse. With the continued decline in next-generation sequencing costs, it will soon be feasible to use genomic reduction techniques to simultaneously sequence thousands of homologous loci across dozens of Penstemon species. Such efforts will greatly facilitate our understanding of the phylogenetic structure within this important drought tolerant genus. In the interim, this study identified thousands of SNPs and over 50 SSRs/INDELs which should provide a foundation for future Penstemon phylogenetic studies and breeding efforts.
Collapse
|
29
|
Napoli E, Wong S, Giulivi C. Evidence of reactive oxygen species-mediated damage to mitochondrial DNA in children with typical autism. Mol Autism 2013; 4:2. [PMID: 23347615 PMCID: PMC3570390 DOI: 10.1186/2040-2392-4-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 01/04/2013] [Indexed: 02/05/2023] Open
Abstract
Background The mitochondrial genome (mtDNA) is particularly susceptible to damage mediated by reactive oxygen species (ROS). Although elevated ROS production and elevated biomarkers of oxidative stress have been found in tissues from children with autism spectrum disorders, evidence for damage to mtDNA is lacking. Findings mtDNA deletions were evaluated in peripheral blood monocytic cells (PBMC) isolated from 2–5 year old children with full autism (AU; n = 67), and typically developing children (TD; n = 46) and their parents enrolled in the CHildhood Autism Risk from Genes and Environment study (CHARGE) at University of California Davis. Sequence variants were evaluated in mtDNA segments from AU and TD children (n = 10; each) and their mothers representing 31.2% coverage of the entire human mitochondrial genome. Increased mtDNA damage in AU children was evidenced by (i) higher frequency of mtDNA deletions (2-fold), (ii) higher number of GC→AT transitions (2.4-fold), being GC preferred sites for oxidative damage, and (iii) higher frequency of G,C,T→A transitions (1.6-fold) suggesting a higher incidence of polymerase gamma incorporating mainly A at bypassed apurinic/apyrimidinic sites, probably originated from oxidative stress. The last two outcomes were identical to their mothers suggesting the inheritance of a template consistent with increased oxidative damage, whereas the frequency of mtDNA deletions in AU children was similar to that of their fathers. Conclusions These results suggest that a combination of genetic and epigenetic factors, taking place during perinatal periods, results in a mtDNA template in children with autism similar to that expected for older individuals.
Collapse
Affiliation(s)
- Eleonora Napoli
- Department of Molecular Biosciences, University of California, One Shields Ave, 1120 Haring Hall, Davis, CA, 95616, USA.
| | | | | |
Collapse
|
30
|
Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA. Development and mapping of SNP assays in allotetraploid cotton. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 124:1201-14. [PMID: 22252442 PMCID: PMC3324690 DOI: 10.1007/s00122-011-1780-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Accepted: 12/22/2011] [Indexed: 05/06/2023]
Abstract
A narrow germplasm base and a complex allotetraploid genome have made the discovery of single nucleotide polymorphism (SNP) markers difficult in cotton (Gossypium hirsutum). To generate sequence for SNP discovery, we conducted a genome reduction experiment (EcoRI, BafI double digest, followed by adapter ligation, biotin-streptavidin purification, and agarose gel separation) on two accessions of G. hirsutum and two accessions of G. barbadense. From the genome reduction experiment, a total of 2.04 million genomic sequence reads were assembled into contigs with an N(50) of 508 bp and analyzed for SNPs. A previously generated assembly of expressed sequence tags (ESTs) provided an additional source for SNP discovery. Using highly conservative parameters (minimum coverage of 8× at each SNP and 20% minor allele frequency), a total of 11,834 and 1,679 non-genic SNPs were identified between accessions of G. hirsutum and G. barbadense in genome reduction assemblies, respectively. An additional 4,327 genic SNPs were also identified between accessions of G. hirsutum in the EST assembly. KBioscience KASPar assays were designed for a portion of the intra-specific G. hirsutum SNPs. From 704 non-genic and 348 genic markers developed, a total of 367 (267 non-genic, 100 genic) mapped in a segregating F(2) population (Acala Maxxa × TX2094) using the Fluidigm EP1 system. A G. hirsutum genetic linkage map of 1,688 cM was constructed based entirely on these new SNP markers. Of the genic-based SNPs, we were able to identify within which genome ('A' or 'D') each SNP resided using diploid species sequence data. Genetic maps generated by these newly identified markers are being used to locate quantitative, economically important regions within the cotton genome.
Collapse
Affiliation(s)
- Robert L. Byers
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602 USA
| | - David B. Harker
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602 USA
| | - Scott M. Yourstone
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602 USA
| | - Peter J. Maughan
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602 USA
| | - Joshua A. Udall
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602 USA
| |
Collapse
|
31
|
Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA. Development and mapping of SNP assays in allotetraploid cotton. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012. [PMID: 22252442 DOI: 10.1007/s00122‐011‐1780‐8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A narrow germplasm base and a complex allotetraploid genome have made the discovery of single nucleotide polymorphism (SNP) markers difficult in cotton (Gossypium hirsutum). To generate sequence for SNP discovery, we conducted a genome reduction experiment (EcoRI, BafI double digest, followed by adapter ligation, biotin-streptavidin purification, and agarose gel separation) on two accessions of G. hirsutum and two accessions of G. barbadense. From the genome reduction experiment, a total of 2.04 million genomic sequence reads were assembled into contigs with an N(50) of 508 bp and analyzed for SNPs. A previously generated assembly of expressed sequence tags (ESTs) provided an additional source for SNP discovery. Using highly conservative parameters (minimum coverage of 8× at each SNP and 20% minor allele frequency), a total of 11,834 and 1,679 non-genic SNPs were identified between accessions of G. hirsutum and G. barbadense in genome reduction assemblies, respectively. An additional 4,327 genic SNPs were also identified between accessions of G. hirsutum in the EST assembly. KBioscience KASPar assays were designed for a portion of the intra-specific G. hirsutum SNPs. From 704 non-genic and 348 genic markers developed, a total of 367 (267 non-genic, 100 genic) mapped in a segregating F(2) population (Acala Maxxa × TX2094) using the Fluidigm EP1 system. A G. hirsutum genetic linkage map of 1,688 cM was constructed based entirely on these new SNP markers. Of the genic-based SNPs, we were able to identify within which genome ('A' or 'D') each SNP resided using diploid species sequence data. Genetic maps generated by these newly identified markers are being used to locate quantitative, economically important regions within the cotton genome.
Collapse
Affiliation(s)
- Robert L Byers
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
| | | | | | | | | |
Collapse
|
32
|
Bachlava E, Taylor CA, Tang S, Bowers JE, Mandel JR, Burke JM, Knapp SJ. SNP discovery and development of a high-density genotyping array for sunflower. PLoS One 2012; 7:e29814. [PMID: 22238659 PMCID: PMC3251610 DOI: 10.1371/journal.pone.0029814] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2011] [Accepted: 12/06/2011] [Indexed: 11/23/2022] Open
Abstract
Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible.
Collapse
Affiliation(s)
- Eleni Bachlava
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, United States of America
| | - Christopher A. Taylor
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, United States of America
| | - Shunxue Tang
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, United States of America
| | - John E. Bowers
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - Jennifer R. Mandel
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - John M. Burke
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| | - Steven J. Knapp
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, United States of America
| |
Collapse
|
33
|
Abstract
MicroRNAs (miRNAs) are among the most important regulatory elements of gene expression in animals and plants. However, their origin and evolutionary dynamics have not been studied systematically. In this paper, we identified putative miRNA genes in 11 plant species using the bioinformatic technique and examined their evolutionary changes. Our homology search indicated that no miRNA gene is currently shared between green algae and land plants. The number of miRNA genes has increased substantially in the land plant lineage, but after the divergence of eudicots and monocots, the number has changed in a lineage-specific manner. We found that miRNA genes have originated mainly by duplication of preexisting miRNA genes or protein-coding genes. Transposable elements also seem to have contributed to the generation of species-specific miRNA genes. The relative importance of these mechanisms in plants is quite different from that in Drosophila species, where the formation of hairpin structures in the genomes seems to be a major source of miRNA genes. This difference in the origin of miRNA genes between plants and Drosophila may be explained by the difference in the binding to target mRNAs between plants and animals. We also found that young miRNA genes are less conserved than old genes in plants as well as in Drosophila species. Yet, nearly half of the gene families in the ancestor of flowering plants have been lost in at least one species examined. This indicates that the repertoires of miRNA genes have changed more dynamically than previously thought during plant evolution.
Collapse
Affiliation(s)
- Masafumi Nozawa
- Department of Biology, Institute of Molecular Evolutionary Genetics, Pennsylvania State University, PA, USA.
| | | | | |
Collapse
|
34
|
Khan MA, Han Y, Zhao YF, Korban SS. A high-throughput apple SNP genotyping platform using the GoldenGate™ assay. Gene 2011; 494:196-201. [PMID: 22209719 DOI: 10.1016/j.gene.2011.12.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2011] [Revised: 11/29/2011] [Accepted: 12/01/2011] [Indexed: 10/14/2022]
Abstract
EST data generated from 14 apple genotypes were downloaded from NCBI and mapped against a reference EST assembly to identify Single Nucleotide Polymorphisms (SNPs). Mapping of these SNPs was undertaken using 90% of sequence similarity and minimum coverage of four reads at each SNP position. In total, 37,807 SNPs were identified with an average of one SNP every 187 bp from a total of 6888 unique EST contigs. Identified SNPs were checked for flanking sequences of ≥ 60 bp along both sides of SNP alleles for reliable design of a custom high-throughput genotyping assay. A total of 12,299 SNPs, representing 6525 contigs, fit the selected criterion of ≥ 60 bp sequences flanking a SNP position. Of these, 1411 SNPs were validated using four apple genotypes. Based on genotyping assays, it was estimated that 60% of SNPs were valid SNPs, while 26% of SNPs might be derived from paralogous regions.
Collapse
Affiliation(s)
- M Awais Khan
- Department of Natural Resources & Environmental Sciences, University of Illinois, Urbana, IL 61801, USA
| | | | | | | |
Collapse
|
35
|
Jung H, Lyons RE, Dinh H, Hurwood DA, McWilliam S, Mather PB. Transcriptomics of a giant freshwater prawn (Macrobrachium rosenbergii): de novo assembly, annotation and marker discovery. PLoS One 2011; 6:e27938. [PMID: 22174756 PMCID: PMC3234237 DOI: 10.1371/journal.pone.0027938] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 10/28/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Giant freshwater prawn (Macrobrachium rosenbergii or GFP), is the most economically important freshwater crustacean species. However, as little is known about its genome, 454 pyrosequencing of cDNA was undertaken to characterise its transcriptome and identify genes important for growth. METHODOLOGY AND PRINCIPAL FINDINGS A collection of 787,731 sequence reads (244.37 Mb) obtained from 454 pyrosequencing analysis of cDNA prepared from muscle, ovary and testis tissues taken from 18 adult prawns was assembled into 123,534 expressed sequence tags (ESTs). Of these, 46% of the 8,411 contigs and 19% of 115,123 singletons possessed high similarity to sequences in the GenBank non-redundant database, with most significant (E value < 1e(-5)) contig (80%) and singleton (84%) matches occurring with crustacean and insect sequences. KEGG analysis of the contig open reading frames identified putative members of several biological pathways potentially important for growth. The top InterProScan domains detected included RNA recognition motifs, serine/threonine-protein kinase-like domains, actin-like families, and zinc finger domains. Transcripts derived from genes such as actin, myosin heavy and light chain, tropomyosin and troponin with fundamental roles in muscle development and construction were abundant. Amongst the contigs, 834 single nucleotide polymorphisms, 1198 indels and 658 simple sequence repeats motifs were also identified. CONCLUSIONS The M. rosenbergii transcriptome data reported here should provide an invaluable resource for improving our understanding of this species' genome structure and biology. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Biogeosciences, Queensland University of Technology, Brisbane, Queensland, Australia.
| | | | | | | | | | | |
Collapse
|
36
|
Gaut B, Yang L, Takuno S, Eguiarte LE. The Patterns and Causes of Variation in Plant Nucleotide Substitution Rates. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2011. [DOI: 10.1146/annurev-ecolsys-102710-145119] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Brandon Gaut
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697; , ,
| | - Liang Yang
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697; , ,
| | - Shohei Takuno
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697; , ,
| | - Luis E. Eguiarte
- Instituto de Ecología, Universidad Nacional Autónoma de México, CP 04510 Mexico City, Mexico;
| |
Collapse
|
37
|
Context-Dependent Evolutionary Models for Non-Coding Sequences: An Overview of Several Decades of Research and an Analysis of Laurasiatheria and Primate Evolution. Evol Biol 2011. [DOI: 10.1007/s11692-011-9139-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
38
|
Flowers JM, Molina J, Rubinstein S, Huang P, Schaal BA, Purugganan MD. Natural Selection in Gene-Dense Regions Shapes the Genomic Pattern of Polymorphism in Wild and Domesticated Rice. Mol Biol Evol 2011; 29:675-87. [DOI: 10.1093/molbev/msr225] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
39
|
Baele G, Van de Peer Y, Vansteelandt S. Context-dependent codon partition models provide significant increases in model fit in atpB and rbcL protein-coding genes. BMC Evol Biol 2011; 11:145. [PMID: 21619569 PMCID: PMC3126739 DOI: 10.1186/1471-2148-11-145] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Accepted: 05/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate modelling of substitution processes in protein-coding sequences is often hampered by the computational burdens associated with full codon models. Lately, codon partition models have been proposed as a viable alternative, mimicking the substitution behaviour of codon models at a low computational cost. Such codon partition models however impose independent evolution of the different codon positions, which is overly restrictive from a biological point of view. Given that empirical research has provided indications of context-dependent substitution patterns at four-fold degenerate sites, we take those indications into account in this paper. RESULTS We present so-called context-dependent codon partition models to assess previous empirical claims that the evolution of four-fold degenerate sites is strongly dependent on the composition of its two flanking bases. To this end, we have estimated and compared various existing independent models, codon models, codon partition models and context-dependent codon partition models for the atpB and rbcL genes of the chloroplast genome, which are frequently used in plant systematics. Such context-dependent codon partition models employ a full dependency scheme for four-fold degenerate sites, whilst maintaining the independence assumption for the first and second codon positions. CONCLUSIONS We show that, both in the atpB and rbcL alignments of a collection of land plants, these context-dependent codon partition models significantly improve model fit over existing codon partition models. Using Bayes factors based on thermodynamic integration, we show that in both datasets the same context-dependent codon partition model yields the largest increase in model fit compared to an independent evolutionary model. Context-dependent codon partition models hence perform closer to codon models, which remain the best performing models at a drastically increased computational cost, compared to codon partition models, but remain computationally interesting alternatives to codon models. Finally, we observe that the substitution patterns in both datasets are drastically different, leading to the conclusion that combined analysis of these two genes using a single model may not be advisable from a context-dependent point of view.
Collapse
Affiliation(s)
- Guy Baele
- Department of Plant Systems Biology, Ghent, Belgium
| | | | | |
Collapse
|
40
|
Relative mutation rates of each nucleotide for another estimated from allele frequency spectra at human gene loci. Genet Res (Camb) 2009; 91:293-303. [PMID: 19640324 DOI: 10.1017/s0016672309990164] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
This study aims to comprehensively examine the mutation rates of one base for another in human gene loci. In contrast to most previous efforts based on divergence data from untranscribed regions, the present study employs the basic theory of the reversible recurrent mutation model using large-scale, high-quality re-sequencing data from public databases of gene loci. Population mutation parameters (4Nnu and 4Nmu) are obtained for each pair of base substitutions. The estimated parameters show good strand reversal symmetry, supporting the existence of mutation-drift equilibrium. Analysis of specific gene regions including mRNA, coding sequence (CDS), 5'-untranslated region (5'-UTRs), 3'-UTR and intron shows that there are clear differences in the mutation rates of each base for another depending on the location of the base in question. Results from analyses that take the adjacent bases into account exhibit excellent strand reversal symmetry, confirming that the identity of an adjacent base influences mutation rates. The CpG to TpG (or CpG to CpA) substitution is found at a rate approximately seven-fold higher than the reverse transition in intron regions due to cytosine deamination, but the effect is strongly reduced in mRNA regions and almost entirely lost in 5'-UTRs. However, from the overall increased transitions in sites other than CpGs and the proportion of CpGs in the total sequence, CpG methylation is not the main factor responsible for the increased rate of transitions as compared with transversions. In this report, after adjusting average mutation rates to the sequence compositions, no substitution bias is found between A+T and C+G, indicating base composition equilibrium in human gene loci. Population differences are also identified between groups of people of African and European descent, presumably due to past population histories. By applying the basic theory of population genetics to re-sequenced data, this study contributes new, detailed information regarding mutations in human gene regions.
Collapse
|
41
|
Katariya PR, Vadhiyar SS. Phylogenetic Predictions on Grids. 2009 FIFTH IEEE INTERNATIONAL CONFERENCE ON E-SCIENCE 2009. [DOI: 10.1109/e-science.2009.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
|
42
|
Tang P, Wang Q, Chen JQ. [The patterns and influences of insertions, deletions and nucleotide substitutions in Solanaceae chloroplast genome]. YI CHUAN = HEREDITAS 2009; 30:1506-12. [PMID: 19073561 DOI: 10.3724/sp.j.1005.2008.01506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Nucleotide substitution and indels (insertions and deletions) events are the major evolutionary driving forces. Comparisons of the indels and nucleotide substitution patterns were made in the chloroplast genomes between Solanum lycopersicum L. and Solanum bulbocastanum L., Nicotiana tomentosiformis L. and Nicotiana tabacum L. in Solanaceae. The influence of mutation on genome composition was analyzed. The indels and substitutions were not randomly distributed throughout the chloroplast genomes. The indels were in AT-rich regions. One base pair indels accounted for above 30% of the total indels. Most of the indels were short of 10 bp. The nucleotide substitutions showed Ts/Tv bias, but transversion frequency of T-->G and A-->C was increased significantly. Ts/Tv rates were lineage-specific. The Ts/Tv rate between S. lycopersicum and S. bulbocastanum was lower than that between N. tomentosiformis and N. tabacum. (A+T)/(G+C) rates varied in different lineages, which had an influence on (G+C)% of genomes. The changes in the (A+T)/(G+C) rates might correlate with the life histories of different species.
Collapse
Affiliation(s)
- Ping Tang
- Biological Department, College of Life Science, Nanjing University, Nanjing 210093, China.
| | | | | |
Collapse
|
43
|
Hale MC, McCormick CR, Jackson JR, Dewoody JA. Next-generation pyrosequencing of gonad transcriptomes in the polyploid lake sturgeon (Acipenser fulvescens): the relative merits of normalization and rarefaction in gene discovery. BMC Genomics 2009; 10:203. [PMID: 19402907 PMCID: PMC2688523 DOI: 10.1186/1471-2164-10-203] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Accepted: 04/29/2009] [Indexed: 11/25/2022] Open
Abstract
Background Next-generation sequencing technologies have been applied most often to model organisms or species closely related to a model. However, these methods have the potential to be valuable in many wild organisms, including those of conservation concern. We used Roche 454 pyrosequencing to characterize gene expression in polyploid lake sturgeon (Acipenser fulvescens) gonads. Results Titration runs on a Roche 454 GS-FLX produced more than 47,000 sequencing reads. These reads represented 20,741 unique sequences that passed quality control (mean length = 186 bp). These were assembled into 1,831 contigs (mean contig depth = 4.1 sequences). Over 4,000 sequencing reads (~19%) were assigned gene ontologies, mostly to protein, RNA, and ion binding. A total of 877 candidate SNPs were identified from > 50 different genes. We employed an analytical approach from theoretical ecology (rarefaction) to evaluate depth of sequencing coverage relative to gene discovery. We also considered the relative merits of normalized versus native cDNA libraries when using next-generation sequencing platforms. Not surprisingly, fewer genes from the normalized libraries were rRNA subunits. Rarefaction suggests that normalization has little influence on the efficiency of gene discovery, at least when working with thousands of reads from a single tissue type. Conclusion Our data indicate that titration runs on 454 sequencers can characterize thousands of expressed sequence tags which can be used to identify SNPs, gene ontologies, and levels of gene expression in species of conservation concern. We anticipate that rarefaction will be useful in evaluations of gene discovery and that next-generation sequencing technologies hold great potential for the study of other non-model organisms.
Collapse
Affiliation(s)
- Matthew C Hale
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN 47907, USA.
| | | | | | | |
Collapse
|
44
|
Morton BR, Dar VUN, Wright SI. Analysis of site frequency spectra from Arabidopsis with context-dependent corrections for ancestral misinference. PLANT PHYSIOLOGY 2009; 149:616-624. [PMID: 19019983 PMCID: PMC2633827 DOI: 10.1104/pp.108.127787] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 11/12/2008] [Indexed: 05/27/2023]
Abstract
Previous studies have shown that the pattern of single nucleotide polymorphism (SNP) in Arabidopsis (Arabidopsis thaliana) deviates from the distribution expected under a neutral model. Here, we test whether or not ancestral misinference could explain this deviation. We start by showing that there are significant and complex influences of context on mutation dynamics as inferred from SNP frequency, in Arabidopsis, and compare the results to observations about context dependency that have been made on a previous analysis of a maize (Zea mays) SNP dataset. The data concerning heterogeneity across sites are then used to make corrections for ancestral misinference in a context-dependent manner. Using Arabidopsis lyrata to infer the ancestral state for SNPs, we show that the resulting unfolded site frequency spectrum (SFS) in Arabidopsis is skewed toward sites with high frequency derived nucleotides. Sites are also partitioned into two general functional classes, second codon position and 4-fold degenerate sites. These two classes show different SFS; although both show an overrepresentation of high frequency derived sites, low frequency derived sites are vastly overrepresented at the second codon position, but significantly underrepresented at 4-fold degenerate sites. We find that these results are robust to corrections for ancestral misinference, even when context-dependent variation in mutation properties is taken into consideration. The data suggest that, in addition to purifying selection, complex demographic events and/or linked positive selection need to be invoked to explain the SFS, and they highlight the importance of sequence context in analyses of genome-wide variation.
Collapse
Affiliation(s)
- Brian R Morton
- Department of Biological Science, Barnard College, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
45
|
The complementary neighborhood patterns and methylation-to-mutation likelihood structures of 15,110 single-nucleotide polymorphisms in the bovine genome. Genetics 2008; 180:639-47. [PMID: 18716328 DOI: 10.1534/genetics.108.090860] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bayesian analysis was performed to examine the single-nucleotide polymorphism (SNPs) neighborhood patterns in cattle using 15,110 SNPs, each with a flanking sequence of 500 bp. Our analysis confirmed three well-known features reported in plants and/or other animals: (1) the transition is the most abundant type of SNPs, accounting for 69.8% in cattle; (2) the transversion occurs most frequently (38.56%) in cattle when the A + T content equals two at their immediate adjacent sites; and (3) C <--> T and A <--> G transitions have reverse complementary neighborhood patterns and so do A <--> C and G <--> T transversions. Our study also revealed several novel SNP neighborhood patterns that have not been reported previously. First, cattle and humans share an overall SNP pattern, indicating a common mutation system in mammals. Second, unlike C <--> T/A <--> G and A <--> C/G <--> T, the true neighborhood patterns for A <--> T and C <--> G might remain mysterious because the sense and antisense sequences flanking these mutations are not actually recognizable. Third, among the reclassified four types of SNPs, the neighborhood ratio between A + T and G + C was quite different. The ratio was lowest for C <--> G, but increased for C <--> T/A <--> G, further for A <--> C/G <--> T, and the most for A <--> T. Fourth, when two immediate adjacent sites provide structures for CpG, it significantly increased transitions compared to the structures without the CpG. Finally, unequal occurrence between A <--> G and C <--> T in five paired neighboring structures indicates that the methylation-induced deamination reactions were responsible for approximately 20% of total transitions. In addition, conversion can occur at both CpG sites and non-CpG sites. Our study provides new insights into understanding molecular mechanisms of mutations and genome evolution.
Collapse
|
46
|
Zheng T, Ichiba T, Morton BR. Assessing substitution variation across sites in grass chloroplast DNA. J Mol Evol 2007; 64:605-13. [PMID: 17541677 DOI: 10.1007/s00239-006-0076-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Accepted: 02/28/2007] [Indexed: 11/24/2022]
Abstract
We assess the similarity of base substitution processes, described by empirically derived 4 x 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC --> AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome.
Collapse
Affiliation(s)
- Tian Zheng
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | | | | |
Collapse
|
47
|
Wang GZ, Chen LL, Zhang HY. Neighboring-site effects of amino acid mutation. Biochem Biophys Res Commun 2007; 353:531-4. [PMID: 17198679 DOI: 10.1016/j.bbrc.2006.12.089] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2006] [Accepted: 12/09/2006] [Indexed: 11/18/2022]
Abstract
Although the context dependence of nucleotide mutation has been supported by accumulating theoretical and experimental evidence, whether this effect can be extended to amino acid mutation remains obscure. As the amino acid doublets (20 x 20) are much more diverse than their nucleotide counterparts (4 x 4), any attempt to address the neighboring-site effects of amino acid mutation was frustrated by deficient amino acid mutation data. Based on the recently revealed 599,745 mutation sites in 45,137 orthologous proteins, we provide solid evidence for the first time to support the existence of neighboring-site effects in amino acid mutation, which is significantly important to improving the prevalent protein-evolution models.
Collapse
Affiliation(s)
- Guang-Zhong Wang
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, PR China
| | | | | |
Collapse
|
48
|
Wang GZ, Chen LL, Zhang HY. Phase-dependent nucleotide substitution in protein-coding sequences. Biochem Biophys Res Commun 2007; 355:599-602. [PMID: 17300744 DOI: 10.1016/j.bbrc.2007.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2006] [Accepted: 01/02/2007] [Indexed: 11/21/2022]
Abstract
It is well known that due to the degeneracy of genetic code, most of the silent substitutions appear in the third codon position, so the mutation frequency of the third codon position is much higher than that of the first two positions. However, it remains unknown whether the directionality of point mutation in three codon positions is similar or not. In this paper, through analyzing 15 sets of orthologous genes, it is revealed that most of the substitution types are significantly different between any two codon positions, especially between the 2nd and the 3rd phases. Furthermore, the average frequencies of each type of substitution calculated from the fifteen sets of orthologous genes are similar to those identified in single nucleotide polymorphisms (SNPs) of human and mouse genome. The present analyses suggest that the nucleotide substitution in protein-coding sequences is not only context-dependent (so called neighboring-nucleotide effects), but also phase-dependent, which is of significance to improving the prevalent nucleotide-evolution models.
Collapse
Affiliation(s)
- Guang-Zhong Wang
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, PR China
| | | | | |
Collapse
|