1
|
Zhang Y, Wang Z, Zeng Y, Liu Y, Xiong S, Wang M, Zhou J, Zou Q. A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape. Brief Bioinform 2021; 23:6470969. [PMID: 34929739 DOI: 10.1093/bib/bbab525] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/28/2021] [Accepted: 11/13/2021] [Indexed: 12/17/2022] Open
Abstract
The discovery of putative transcription factor binding sites (TFBSs) is important for understanding the underlying binding mechanism and cellular functions. Recently, many computational methods have been proposed to jointly account for DNA sequence and shape properties in TFBSs prediction. However, these methods fail to fully utilize the latent features derived from both sequence and shape profiles and have limitation in interpretability and knowledge discovery. To this end, we present a novel Deep Convolution Attention network combining Sequence and Shape, dubbed as D-SSCA, for precisely predicting putative TFBSs. Experiments conducted on 165 ENCODE ChIP-seq datasets reveal that D-SSCA significantly outperforms several state-of-the-art methods in predicting TFBSs, and justify the utility of channel attention module for feature refinements. Besides, the thorough analysis about the contribution of five shapes to TFBSs prediction demonstrates that shape features can improve the predictive power for transcription factors-DNA binding. Furthermore, D-SSCA can realize the cross-cell line prediction of TFBSs, indicating the occupancy of common interplay patterns concerning both sequence and shape across various cell lines. The source code of D-SSCA can be found at https://github.com/MoonLord0525/.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China.,School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, Chengdu, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Yuanqi Zeng
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Jiliu Zhou
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, China
| |
Collapse
|
2
|
Shi L, Wu X, Yang Y, Ma Z, Lv X, Liu L, Li Y, Zhao F, Han B, Sun D. A post-GWAS confirming the genetic effects and functional polymorphisms of AGPAT3 gene on milk fatty acids in dairy cattle. J Anim Sci Biotechnol 2021; 12:24. [PMID: 33522959 PMCID: PMC7849138 DOI: 10.1186/s40104-020-00540-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 12/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND People are paying more attention to the healthy and balanced diet with the improvement of their living standards. Milk fatty acids (FAs) have been reported that they were related to some atherosclerosis and coronary heart diseases in human. In our previous genome-wide association study (GWAS) on milk FAs in dairy cattle, 83 genome-wide significant single nucleotide polymorphisms (SNPs) were detected. Among them, two SNPs, ARS-BFGL-NGS-109493 and BTA-56389-no-rs associated with C18index (P = 0.0459), were located in the upstream of 1-acylglycerol-3-phosphate O-acyltransferase 3 (AGPAT3) gene. AGPAT3 is involved in glycerol-lipid, glycerol-phospholipid metabolism and phospholipase D signaling pathways. Hence, it was inferred as a candidate gene for milk FAs. The aim of this study was to further confirm the genetic effects of the AGPAT3 gene on milk FA traits in dairy cattle. RESULTS Through re-sequencing the complete coding region, and 3000 bp of 5' and 3' regulatory regions of the AGPAT3 gene, a total of 17 SNPs were identified, including four in 5' regulatory region, one in 5' untranslated region (UTR), three in introns, one in 3' UTR, and eight in 3' regulatory region. By the linkage disequilibrium (LD) analysis with Haploview4.1 software, two haplotype blocks were observed that were formed by four and 12 identified SNPs, respectively. Using SAS9.2, we performed single locus-based and haplotype-based association analysis on 24 milk FAs in 1065 Chinese Holstein cows, and discovered that all the SNPs and the haplotype blocks were significantly associated with C6:0, C8:0 and C10:0 (P < 0.0001-0.0384). Further, with Genomatix, we predicted that four SNPs in 5' regulatory region (g.146702957G > A, g.146704373A > G, g.146704618A > G and g.146704699G > A) changed the transcription factor binding sites (TFBSs) for transcription factors SMARCA3, REX1, VMYB, BRACH, NKX26, ZBED4, SP1, USF1, ARNT and FOXA1. Out of them, two SNPs were validated to impact transcriptional activity by performing luciferase assay that the alleles A of both SNPs, g.146704373A > G and g.146704618A > G, increased the transcriptional activities of AGPAT3 promoter compared with alleles G (P = 0.0004). CONCLUSIONS In conclusion, our findings first demonstrated the significant genetic associations of the AGPAT3 gene with milk FAs in dairy cattle, and two potential causal mutations were detected.
Collapse
Affiliation(s)
- Lijun Shi
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xin Wu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Yuze Yang
- Beijing General Station of Animal Husbandry, Beijing, 100101, China
| | - Zhu Ma
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Xiaoqing Lv
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Yanhua Li
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Feng Zhao
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Bo Han
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Dongxiao Sun
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China.
| |
Collapse
|
3
|
Zhou J, Lu Q, Xu R, Gui L, Wang H. Prediction of TF-Binding Site by Inclusion of Higher Order Position Dependencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1383-1393. [PMID: 30629513 DOI: 10.1109/tcbb.2019.2892124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Most proposed methods for TF-binding site (TFBS) predictions only use low order dependencies for predictions due to the lack of efficient methods to extract higher order dependencies. In this work, we first propose a novel method to extract higher order dependencies by applying CNN on histone modification features. We then propose a novel TFBS prediction method, referred to as CNN_TF, by incorporating low order and higher order dependencies. CNN_TF is first evaluated on 13 TFs in the mES cell. Results show that using higher order dependencies outperforms low order dependencies significantly on 11 TFs. This indicates that higher order dependencies are indeed more effective for TFBS predictions than low order dependencies. Further experiments show that using both low order dependencies and higher order dependencies improves performance significantly on 12 TFs, indicating the two dependency types are complementary. To evaluate the influence of cell-types on prediction performances, CNN_TF was applied to five TFs in five cell-types of humans. Even though low order dependencies and higher order dependencies show different contributions in different cell-types, they are always complementary in predictions. When comparing to several state-of-the-art methods, CNN_TF outperforms them by at least 5.3 percent in AUPR.
Collapse
|
4
|
Banirazi Motlagh N, Mohammadpour Esfahani B, Ashrafi B, Zare-Mirakabad F. The assessment of histone acetylation marks in the vicinity of transcription factor binding sites in human CD4 + T cells using information theory methods. Comput Biol Chem 2020; 86:107232. [PMID: 32142982 DOI: 10.1016/j.compbiolchem.2020.107232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Revised: 01/29/2019] [Accepted: 02/08/2020] [Indexed: 11/24/2022]
Abstract
The genetic information encoded in structural genes is decoded by an intracellular process called gene expression. This mechanism is regulated by epigenetic processes such as histone acetylation. Histone acetylation, which happens in nucleosomes, exposes DNA (genome) to transcription factors. Therefore, the correlation between histone acetylation and gene expression has been assessed as a fundamental issue in many previous studies. In the proposed research, we investigate which marks of histone acetylation are informative and which ones are redundant in the vicinity of SP1 transcription factor binding sites, in human CD4 + T cell. To achieve this, we use information theory methods. Subsequently, we apply a multilayer perceptron neural network to show that the selected histone acetylation marks by information theory methods are sufficiently informative. Finally, we use the neural network to predict binding sites of 17 other transcription factors on chromosomes 1 and 2. The results suggest that information conveyed by the selected histone acetylation marks are equivalent to that of all 18 marks associated with SP1 transcription factor binding sites on chromosome 1. Furthermore, almost 91.75 % of SP1 binding sites of chromosome 2 are predicted by the selected histone acetylation marks while all 18 marks predict 90.56 % correctly. Moreover, the selected histone acetylation marks are efficient at predicting 17 other types of transcription factor binding sites.
Collapse
Affiliation(s)
- Nafiseh Banirazi Motlagh
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | | | - Behnoosh Ashrafi
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| |
Collapse
|
5
|
Shrestha S, Sewell JA, Santoso CS, Forchielli E, Carrasco Pro S, Martinez M, Fuxman Bass JI. Discovering human transcription factor physical interactions with genetic variants, novel DNA motifs, and repetitive elements using enhanced yeast one-hybrid assays. Genome Res 2020; 29:1533-1544. [PMID: 31481462 PMCID: PMC6724672 DOI: 10.1101/gr.248823.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 07/23/2019] [Indexed: 12/29/2022]
Abstract
Identifying transcription factor (TF) binding to noncoding variants, uncharacterized DNA motifs, and repetitive genomic elements has been technically and computationally challenging. Current experimental methods, such as chromatin immunoprecipitation, generally test one TF at a time, and computational motif algorithms often lead to false-positive and -negative predictions. To address these limitations, we developed an experimental approach based on enhanced yeast one-hybrid assays. The first variation of this approach interrogates the binding of >1000 human TFs to repetitive DNA elements, while the second evaluates TF binding to single nucleotide variants, short insertions and deletions (indels), and novel DNA motifs. Using this approach, we detected the binding of 75 TFs, including several nuclear hormone receptors and ETS factors, to the highly repetitive Alu elements. Further, we identified cancer-associated changes in TF binding, including gain of interactions involving ETS TFs and loss of interactions involving KLF TFs to different mutations in the TERT promoter, and gain of a MYB interaction with an 18-bp indel in the TAL1 superenhancer. Additionally, we identified TFs that bind to three uncharacterized DNA motifs identified in DNase footprinting assays. We anticipate that these enhanced yeast one-hybrid approaches will expand our capabilities to study genetic variation and undercharacterized genomic regions.
Collapse
Affiliation(s)
- Shaleen Shrestha
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | - Jared Allan Sewell
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | | | - Elena Forchielli
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | | | - Melissa Martinez
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | - Juan Ignacio Fuxman Bass
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA.,Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
6
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
7
|
Han B, Yuan Y, Shi L, Li Y, Liu L, Sun D. Identification of single nucleotide polymorphisms of PIK3R1 and DUSP1 genes and their genetic associations with milk production traits in dairy cows. J Anim Sci Biotechnol 2019; 10:81. [PMID: 31709048 PMCID: PMC6833155 DOI: 10.1186/s40104-019-0392-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 09/06/2019] [Indexed: 01/15/2023] Open
Abstract
Background Previously, phosphoinositide-3-kinase regulatory subunit 1 (PIK3R1) and dual specificity phosphatase 1 (DUSP1) were identified as promising candidate genes for milk production traits due to their being differentially expressed between the dry period and the peak of lactation in livers of dairy cows. Hence, in this study, the single nucleotide polymorphisms (SNPs) of PIK3R1 and DUSP1 genes were identified and their genetic associations with milk yield, fat yield, fat percentage, protein yield, and protein percentage, were investigated using 1067 Chinese Holstein cows from 40 sire families. Results By re-sequencing the entire coding region and 2000 bp of the 5′ and 3′ flanking regions of the two genes, one SNP in the 5′ untranslated region (UTR), three in the 3′ UTR, and two in the 3′ flanking region of PIK3R1 were identified, and one in the 5′ flanking region, one in the 3′ UTR, and two in the 3′ flanking region of DUSP1 were found. Subsequent single-locus association analyses showed that five SNPs in PIK3R1, rs42590258, rs210389799, rs208819656, rs41255622, rs133655926, and rs211408208, and four SNPs in DUSP1, rs207593520, rs208460068, rs209154772, and rs210000760, were significantly associated with milk, fat and protein yields in the first or second lactation (P values ≤ 0.0001 and 0.0461). In addition, by the Haploview 4.2 software, the six and four SNPs in PIK3R1 and DUSP1 respectively formed one haplotype block, and the haplotype-based association analyses showed significant associations between their haplotype combinations and the milk traits in both two lactations (P values ≤ 0.0001 and 0.0364). One SNP, rs207593520(T/G), was predicted to alter the transcription factor binding sites (TFBSs) in the 5′ flanking region of DUSP1. Further, the dual-luciferase assay showed that the transcription activity of allele T in rs207593520 was significantly higher than that of allele G, suggesting the activation of transcriptional activity of DUSP1 gene by allele T of rs207593520. Thus, the rs207593520 SNP was highlighted as a potential causal mutation that should be further verified. Conclusions We demonstrated novel and significant genetic effects of the PIK3R1 and DUSP1 genes on milk production traits in dairy cows, and our findings provide information for use in dairy cattle breeding.
Collapse
Affiliation(s)
- Bo Han
- 1Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193 China
| | - Yuwei Yuan
- 1Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193 China
| | - Lijun Shi
- 1Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193 China
| | - Yanhua Li
- 1Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193 China.,Beijing Dairy Cattle Center, Beijing, 100192 China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing, 100192 China
| | - Dongxiao Sun
- 1Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193 China
| |
Collapse
|
8
|
Lan G, Zhou J, Xu R, Lu Q, Wang H. Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network. Int J Mol Sci 2019; 20:ijms20143425. [PMID: 31336830 PMCID: PMC6679139 DOI: 10.3390/ijms20143425] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/27/2019] [Accepted: 07/08/2019] [Indexed: 01/18/2023] Open
Abstract
Transcription factor binding sites (TFBSs) play an important role in gene expression regulation. Many computational methods for TFBS prediction need sufficient labeled data. However, many transcription factors (TFs) lack labeled data in cell types. We propose a novel method, referred to as DANN_TF, for TFBS prediction. DANN_TF consists of a feature extractor, a label predictor, and a domain classifier. The feature extractor and the domain classifier constitute an Adversarial Network, which ensures that learned features are common features across different cell types. DANN_TF is evaluated on five TFs in five cell types with a total of 25 cell-type TF pairs and compared to a baseline method which does not use Adversarial Network. For both data augmentation and cross-cell-type prediction, DANN_TF performs better than the baseline method on most cell-type TF pairs. DANN_TF is further evaluated by an additional 13 TFs in the five cell types with a total of 65 cell-type TF pairs. Results show that DANN_TF achieves significantly higher AUC than the baseline method on 96.9% pairs of the 65 cell-type TF pairs. This is a strong indication that DANN_TF can indeed learn common features for cross-cell-type TFBS prediction.
Collapse
Affiliation(s)
- Gongqiang Lan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
| | - Qin Lu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong 810005, China
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| |
Collapse
|
9
|
Han B, Yuan Y, Li Y, Liu L, Sun D. Single Nucleotide Polymorphisms of NUCB2 and their Genetic Associations with Milk Production Traits in Dairy Cows. Genes (Basel) 2019; 10:E449. [PMID: 31200542 PMCID: PMC6627143 DOI: 10.3390/genes10060449] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/04/2019] [Accepted: 06/12/2019] [Indexed: 02/07/2023] Open
Abstract
We previously used the RNA sequencing technique to detect the hepatic transcriptome of Chinese Holstein cows among the dry period, early lactation, and peak of lactation, and implied that the nucleobindin 2 (NUCB2) gene might be associated with milk production traits due to its expression being significantly increased in early lactation or peak of lactation as compared to dry period (q value < 0.05). Hence, in this study, we detected the single nucleotide polymorphisms (SNPs) of NUCB2 and analyzed their genetic associations with milk yield, fat yield, fat percentage, protein yield, and protein percentage. We re-sequenced the entire coding and 2000 bp of 5' and 3' flanking regions of NUCB2 by pooled sequencing, and identified ten SNPs, including one in 5' flanking region, two in 3' untranslated region (UTR), and seven in 3' flanking region. The single-SNP association analysis results showed that the ten SNPs were significantly associated with milk yield, fat yield, fat percentage, protein yield, or protein percentage in the first or second lactation (p values <= 1 × 10-4 and 0.05). In addition, we estimated the linkage disequilibrium (LD) of the ten SNPs by Haploview 4.2, and found that the SNPs were highly linked in one haplotype block (D' = 0.98-1.00), and the block was also significantly associated with at least one milk traits in the two lactations (p values: 0.0002-0.047). Further, we predicted the changes of transcription factor binding sites (TFBSs) that are caused by the SNPs in the 5' flanking region of NUCB2, and considered that g.35735477C>T might affect the expression of NUCB2 by changing the TFBSs for ETS transcription factor 3 (ELF3), caudal type homeobox 2 (CDX2), mammalian C-type LTR TATA box (VTATA), nuclear factor of activated T-cells (NFAT), and v-ets erythroblastosis virus E26 oncogene homolog (ERG) (matrix similarity threshold, MST > 0.85). However, the further study should be performed to verify the regulatory mechanisms of NUCB2 and its polymorphisms on milk traits. Our findings first revealed the genetic effects of NUCB2 on the milk traits in dairy cows, and suggested that the significant SNPs could be used in genomic selection to improve the accuracy of selection for dairy cattle breeding.
Collapse
Affiliation(s)
- Bo Han
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing 100193, China.
| | - Yuwei Yuan
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing 100193, China.
| | - Yanhua Li
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing 100193, China.
- Beijing Key Laboratory of Dairy Cattle Genetic, Breeding and Reproduction, Beijing Dairy Cattle Center, Beijing 100192, China.
| | - Lin Liu
- Beijing Key Laboratory of Dairy Cattle Genetic, Breeding and Reproduction, Beijing Dairy Cattle Center, Beijing 100192, China.
| | - Dongxiao Sun
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
10
|
Cui XJ, Cai L, Xing YQ, Zhao XJ, Shi CX. Influence factors on the correlations between expression levels of neighboring pattern genes. Biosystems 2015; 139:23-8. [PMID: 26696439 DOI: 10.1016/j.biosystems.2015.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Revised: 10/07/2015] [Accepted: 11/23/2015] [Indexed: 10/22/2022]
Abstract
Some genes tend to cluster and be co-expressed. Multiple factors affect gene co-expression. In this study, we investigated the relationships between multiple factors and the correlations of expression levels of neighboring genes, which were divided into four kinds of pattern genes and one type of non-pattern gene. Our results indicate that the correlation between expression levels of neighboring non-pattern genes is related to multiple factors with the exception of transcriptional orientations of neighboring genes. The correlation between expression levels of neighboring specific genes or neighboring repressed genes is likely to be dependent on the co-functions of neighboring genes. The correlation between expression levels of neighboring housekeeping genes is associated with histone modifications in intergenic regions.
Collapse
Affiliation(s)
- Xiang-Jun Cui
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China.
| | - Lu Cai
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Yong-Qiang Xing
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Xiu-Juan Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| | - Chen-Xia Shi
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
| |
Collapse
|
11
|
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015; 11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open
Abstract
Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.
Collapse
|