1
|
Li W, Li M, Pu X, Guo Y. Distinguishing the disease-associated SNPs based on composition frequency analysis. Interdiscip Sci 2017; 9:459-467. [PMID: 29143920 DOI: 10.1007/s12539-017-0248-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 06/03/2017] [Accepted: 06/26/2017] [Indexed: 12/22/2022]
Abstract
Single-nucleotide polymorphism (SNP) is a basical variation in genome. When SNPs occur at the binding sites of microRNA, they can influence the binding efficiency, cause a fluctuation of the mRNA in vivo, and thus arouse posttranscriptional level abnormality. Therefore, SNP has a strong correlation with diseases. Although enormous SNPs have been experimentally identified, only a tiny proportion of them are truly disease-associated SNPs (dSNPs) that relate to microRNA modification and then are involved in disease causing process. Therefore, it is important to distinguish dSNPs from the usual SNPs. Analysis here shows that composition is different between sequence segments centered by dSNP and SNP. Inspired by the composition, transition and distribution features which are meaningful and effective in characterizing proteins' sequence information, we improved and applied it to represent the frequency and physicochemical properties of a gene sequence. Binary encoding scheme was also used for further labelling four nucleic acids (A, T, C, and G). First, clustering analysis was performed to gain reasonable negative samples. Then, optimization tests were implemented on different ratios of positive vs negative samples and different feature subsets retrieved by evaluation method of F score. The optimal model constructed by random forest achieves an accuracy of more than 90% on the testing data set. Moreover, the promising results of the external validation also demonstrate the practical applicability of our method. Finally, principal component analysis on the features indicates that all features in our method gain the gross contribution to the prediction model.
Collapse
Affiliation(s)
- Wenling Li
- College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China.
| |
Collapse
|
2
|
Xie J, Li R, Li S, Ran X, Wang J, Jiang J, Zhao P. Identification of Copy Number Variations in Xiang and Kele Pigs. PLoS One 2016; 11:e0148565. [PMID: 26840413 PMCID: PMC4740446 DOI: 10.1371/journal.pone.0148565] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 01/19/2016] [Indexed: 12/24/2022] Open
Abstract
Xiang and Kele pigs are two well-known local Chinese pig breeds that possess rich genetic resources and have enormous economic and scientific value. We performed a comprehensive genomic analysis of the copy number variations (CNVs) in these breeds. CNVs are one of the most important forms of genomic variation and have profound effects on phenotypic variation. In this study, PorcineSNP60 genotyping data from 98 Xiang pigs and 22 Kele pigs were used to identify CNVs. In total, 172 candidate CNV regions (CNVRs) were identified, ranging from 3.19 kb to 8175.26 kb and covering 80.41 Mb of the pig genome. Approximately 56.40% (97/172) of the CNVRs overlapped with those identified in seven previous studies, and 43.60% (75/172) of the identified CNVRs were novel. Of the identified CNVRs, 82 (47 gain, 33 loss, and two gain-loss events that covered 4.58 Mb of the pig genome) were found only in a Xiang population with a large litter size. In contrast, 13 CNVRs (8 gain and 5 loss events) were unique to a Xiang population with small litter sizes, and 30 CNVRs (14 loss and 16 gain events) were unique to Kele pigs. The CNVRs span approximately 660 annotated Sus scrofa genes that are significantly enriched for specific biological functions, such as sensory perception, cognition, reproduction, ATP biosynthetic processes, and neurological processes. Many CNVR-associated genes, particularly the genes involved in reproductive traits, differed between the Xiang populations with large and small litter sizes, and these genes warrant further investigation due to their importance in determining the reproductive performance of Xiang pigs. Our results provide meaningful information about genomic variation, which may be useful in future assessments of the associations between CNVs and important phenotypes in Xiang and Kele pigs to ultimately help protect these rare breeds.
Collapse
Affiliation(s)
- Jian Xie
- Institute of Agro-Bioengineering and College of Life Sciences, Guizhou University, Guiyang, China
| | - Rongrong Li
- Institute of Agro-Bioengineering and College of Life Sciences, Guizhou University, Guiyang, China
| | - Sheng Li
- Institute of Agro-Bioengineering and College of Life Sciences, Guizhou University, Guiyang, China
| | - Xueqin Ran
- College of animal Science, Guizhou University, Guiyang, China
- * E-mail: (XQR); (JFW)
| | - Jiafu Wang
- Institute of Agro-Bioengineering and College of Life Sciences, Guizhou University, Guiyang, China
- * E-mail: (XQR); (JFW)
| | - Jicai Jiang
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Pengju Zhao
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
3
|
Nicolazzi EL, Biffani S, Biscarini F, Orozco Ter Wengel P, Caprera A, Nazzicari N, Stella A. Software solutions for the livestock genomics SNP array revolution. Anim Genet 2015; 46:343-53. [PMID: 25907889 DOI: 10.1111/age.12295] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2015] [Indexed: 02/04/2023]
Abstract
Since the beginning of the genomic era, the number of available single nucleotide polymorphism (SNP) arrays has grown considerably. In the bovine species alone, 11 SNP chips not completely covered by intellectual property are currently available, and the number is growing. Genomic/genotype data are not standardized, and this hampers its exchange and integration. In addition, software used for the analyses of these data usually requires not standard (i.e. case specific) input files which, considering the large amount of data to be handled, require at least some programming skills in their production. In this work, we describe a software toolkit for SNP array data management, imputation, genome-wide association studies, population genetics and genomic selection. However, this toolkit does not solve the critical need for standardization of the genotypic data and software input files. It only highlights the chaotic situation each researcher has to face on a daily basis and gives some helpful advice on the currently available tools in order to navigate the SNP array data complexity.
Collapse
Affiliation(s)
- E L Nicolazzi
- Fondazione Parco Tecnologico Padano (PTP), Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| | - S Biffani
- Istituto di biologia e biotecnologia Agraria (IBBA-CNR), Consiglio Nazionale delle Ricerche, Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| | - F Biscarini
- Fondazione Parco Tecnologico Padano (PTP), Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| | - P Orozco Ter Wengel
- School of Biosciences, Cardiff University, Museum Avenue, Cardiff, CF10 3AX, UK
| | - A Caprera
- Fondazione Parco Tecnologico Padano (PTP), Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| | - N Nazzicari
- Fondazione Parco Tecnologico Padano (PTP), Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| | - A Stella
- Fondazione Parco Tecnologico Padano (PTP), Via Einstein, Cascina Codazza, Lodi, 26900, Italy.,Istituto di biologia e biotecnologia Agraria (IBBA-CNR), Consiglio Nazionale delle Ricerche, Via Einstein, Cascina Codazza, Lodi, 26900, Italy
| |
Collapse
|
4
|
Kim K, Kwak W, Sung SS, Cho S, Kim H, Yoon D, Lee HJ. A novel genetic variant database for Korean native cattle (Hanwoo): HanwooGDB. Genes Genomics 2015. [DOI: 10.1007/s13258-014-0224-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
5
|
Strillacci MG, Frigo E, Schiavini F, Samoré AB, Canavesi F, Vevey M, Cozzi MC, Soller M, Lipkin E, Bagnato A. Genome-wide association study for somatic cell score in Valdostana Red Pied cattle breed using pooled DNA. BMC Genet 2014; 15:106. [PMID: 25288516 PMCID: PMC4198737 DOI: 10.1186/s12863-014-0106-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 09/25/2014] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Mastitis is a major disease of dairy cattle occurring in response to environmental exposure to infective agents with a great economic impact on dairy industry. Somatic cell count (SCC) and its log transformation in somatic cell score (SCS) are traits that have been used as indirect measures of resistance to mastitis for decades in selective breeding. A selective DNA pooling (SDP) approach was applied to identify Quantitative Trait Loci (QTL) for SCS in Valdostana Red Pied cattle using the Illumina Bovine HD BeadChip. RESULTS A total of 171 SNPs reached the genome-wide significance for association with SCS. Fifty-two SNPs were annotated within genes, some of those involved in the immune response to mastitis. On BTAs 1, 2, 3, 4, 9, 13, 15, 17, 21 and 22 the largest number of markers in association to the trait was found. These regions identified novel genomic regions related to mastitis (1-Mb SNP windows) and confirmed those already mapped. The largest number of significant SNPs exceeding the threshold for genome-wide significant signal was found on BTA 15, located at 50.43-51.63 Mb. CONCLUSIONS The genomic regions identified in this study contribute to a better understanding of the genetic control of the mastitis immune response in cattle and may allow the inclusion of more detailed QTL information in selection programs.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Alessandro Bagnato
- Department of Health, Animal Science and Food Safety (VESPA), University of Milan, Via Celoria 10, Milan, 20133, Italy.
| |
Collapse
|
6
|
Nicolazzi EL, Picciolini M, Strozzi F, Schnabel RD, Lawley C, Pirani A, Brew F, Stella A. SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock. BMC Genomics 2014; 15:123. [PMID: 24517501 PMCID: PMC3923093 DOI: 10.1186/1471-2164-15-123] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 02/06/2014] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. DESCRIPTION Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. CONCLUSIONS This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.
Collapse
|
7
|
Fortes MR, DeAtley KL, Lehnert SA, Burns BM, Reverter A, Hawken RJ, Boe-Hansen G, Moore SS, Thomas MG. Genomic regions associated with fertility traits in male and female cattle: Advances from microsatellites to high-density chips and beyond. Anim Reprod Sci 2013; 141:1-19. [DOI: 10.1016/j.anireprosci.2013.07.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Revised: 07/03/2013] [Accepted: 07/07/2013] [Indexed: 01/08/2023]
|