1
|
Zhang L, Han L, Huang Y, Feng Z, Wang X, Li H, Song F, Liu L, Li J, Zheng H, Wang P, Song F, Chen K. SNPs within microRNA binding sites and the prognosis of breast cancer. Aging (Albany NY) 2021; 13:7465-7480. [PMID: 33658398 PMCID: PMC7993692 DOI: 10.18632/aging.202612] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/29/2020] [Indexed: 12/25/2022]
Abstract
Single nucleotide polymorphisms (SNPs) within microRNA binding sites can affect the binding of microRNA to mRNA and regulate gene expression, thereby contributing to cancer prognosis. Here we performed a two-stage study of 2647 breast cancer patients to explore the association between SNPs within microRNA binding sites and breast cancer prognosis. In stage I, we genotyped 192 SNPs within microRNA binding sites using the Illumina Goldengate platform. In stage II, we validated SNPs associated with breast cancer prognosis in another dataset using the TaqMan platform. We identified 8 SNPs significantly associated with breast cancer prognosis in stage I (P<0.05), and only rs10878441 was statistically significant in stage II (AA vs CC, HR=2.21, 95% CI: 1.11-4.42, P=0.024). We combined the data from stage I and stage II, and found that, compared with rs10878441 AA genotype, CC genotype was associated with poor survival of breast cancer (HR=2.19, 95% CI: 1.30-3.70, P=0.003). Stratified analyses demonstrated that rs10878441 was related to breast cancer prognosis in grade II and lymph node-negative patients (P<0.05). The Leucine-rich repeat kinase 2 (LRRK2) rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population and may be used as a potential prognostic biomarker for breast cancer. • The LRRK2 rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population. • Stratified analyses demonstrated that rs10878441 was related to breast cancer prognosis in grade II patients and lymph node-negative patients.
Collapse
Affiliation(s)
- Liwen Zhang
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Lu Han
- Department of Infection Control, Tianjin Huanhu Hospital, Tianjin 300350, People's Republic of China
| | - Yubei Huang
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Ziwei Feng
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Xin Wang
- Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Sichuan 610041, People's Republic of China
| | - Haixin Li
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China.,Department of Cancer Biobank, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, National Clinical Research Centre of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Fangfang Song
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Luyang Liu
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Junxian Li
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Hong Zheng
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Peishan Wang
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Fengju Song
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, People's Republic of China
| |
Collapse
|
3
|
Wu C, Li S, Cui Y. Genetic association studies: an information content perspective. Curr Genomics 2012; 13:566-73. [PMID: 23633916 PMCID: PMC3468889 DOI: 10.2174/138920212803251382] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Revised: 06/04/2012] [Accepted: 06/18/2012] [Indexed: 01/02/2023] Open
Abstract
The availability of high-density single nucleotide polymorphisms (SNPs) data has made the human genetic association studies possible to identify common and rare variants underlying complex diseases in a genome-wide scale. A handful of novel genetic variants have been identified, which gives much hope and prospects for the future of genetic association studies. In this process, statistical and computational methods play key roles, among which information-based association tests have gained large popularity. This paper is intended to give a comprehensive review of the current literature in genetic association analysis casted in the framework of information theory. We focus our review on the following topics: (1) information theoretic approaches in genetic linkage and association studies; (2) entropy-based strategies for optimal SNP subset selection; and (3) the usage of theoretic information criteria in gene clustering and gene regulatory network construction.
Collapse
Affiliation(s)
- Cen Wu
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824
| | - Shaoyu Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824
- Center for Computational Biology, Beijing Forestry University, Beijing, China 100083
| |
Collapse
|
5
|
Chuang LY, Yang CS, Ho CH, Yang CH. Tag SNP selection using particle swarm optimization. Biotechnol Prog 2010; 26:580-8. [PMID: 20039435 DOI: 10.1002/btpr.350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome-wide SNP discovery, many genome-wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome-wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM-based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Dept. of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | | | | | | |
Collapse
|
6
|
Liu L, Wu Y, Lonardi S, Jiang T. Efficient genome-wide TagSNP selection across populations via the linkage disequilibrium criterion. J Comput Biol 2010; 17:21-37. [PMID: 20078395 DOI: 10.1089/cmb.2007.0228] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this article, we studied the tag single-nucleotide polymorphism (tagSNP) selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We proposed a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and presented efficient solutions for MCTS. Our approach consists of the following three main steps: (i) partitioning the SNP markers into small disjoint components, (ii) applying some data reduction rules to simplify the problem, and (iii) applying either a fast greedy algorithm or a Lagrangian relaxation algorithm to solve the remaining (general) MCTS. These algorithms also provide lower bounds on tagging (i.e., the minimum number of tagSNPs needed). The lower bounds allow us to evaluate how far our solution is from the optimum. To the best of our knowledge, it is the first time the tagging lower bounds are discussed in the literature. We assessed the performance of our algorithms on real HapMap data for genome-wide tagging. The experiments demonstrated that our algorithms run 3-4 orders of magnitude faster than the existing single-population tagging programs such as FESTA, LD-Select, and the multiple-population tagging method MultiPop-TagSelect. Our method also greatly reduced the required tagSNPs compared with LD-Select on a single population and MultiPop-TagSelect on multiple populations. Moreover, the numbers of tagSNPs selected by our algorithms are almost optimal because they are very close to the corresponding lower bounds obtained by our method.
Collapse
Affiliation(s)
- Lan Liu
- Department of Computer Science and Engineering, University of California, Riverside, California, USA.
| | | | | | | |
Collapse
|
7
|
Lin S, Ding J. Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA Studies. Biometrics 2008; 65:9-18. [PMID: 18479487 DOI: 10.1111/j.1541-0420.2008.01044.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
One of the major challenges facing researchers studying complex biological systems is integration of data from -omics platforms. Omic-scale data include DNA variations, transcriptom profiles, and RAomics. Selection of an appropriate approach for a data-integration task is problem dependent, primarily dictated by the information contained in the data. In situations where modeling of multiple raw datasets jointly might be extremely challenging due to their vast differences, rankings from each dataset would provide a commonality based on which results could be integrated. Aggregation of microRNA targets predicted from different computational algorithms is such a problem. Integration of results from multiple mRNA studies based on different platforms is another example that will be discussed. Formulating the problem of integrating ranked lists as minimizing an objective criterion, we explore the usage of a cross entropy Monte Carlo method for solving such a combinatorial problem. Instead of placing a discrete uniform distribution on all the potential solutions, an iterative importance sampling technique is utilized "to slowly tighten the net" to place most distributional mass on the optimal solution and its neighbors. Extensive simulation studies were performed to assess the performance of the method. With satisfactory simulation results, the method was applied to the microRNA and mRNA problems to illustrate its utility.
Collapse
Affiliation(s)
- Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio 43210-1247, USA.
| | | |
Collapse
|
8
|
Windelinckx A, Vlietinck R, Aerssens J, Beunen G, Thomis MAI. Selection of genes and single nucleotide polymorphisms for fine mapping starting from a broad linkage region. Twin Res Hum Genet 2008; 10:871-85. [PMID: 18179400 DOI: 10.1375/twin.10.6.871] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Fine mapping of linkage peaks is one of the great challenges facing researchers who try to identify genes and genetic variants responsible for the variation in a certain trait or complex disease. Once the trait is linked to a certain chromosomal region, most studies use a candidate gene approach followed by a selection of polymorphisms within these genes, either based on their possibility to be functional, or based on the linkage disequilibrium between adjacent markers. For both candidate gene selection and SNP selection, several approaches have been described, and different software tools are available. However, mastering all these information sources and choosing between the different approaches can be difficult and time-consuming. Therefore, this article lists several of these in silico procedures, and the authors describe an empirical two-step fine mapping approach, in which candidate genes are prioritized using a bioinformatics approach (ENDEAVOUR), and the top genes are chosen for further SNP selection with a linkage disequilibrium based method (Tagger). The authors present the different actions that were applied within this approach on two previously identified linkage regions for muscle strength. This resulted in the selection of 331 polymorphisms located in 112 different candidate genes out of an initial set of 23,300 SNPs.
Collapse
Affiliation(s)
- An Windelinckx
- Research Center for Exercise and Health, Department of Biomedical Kinesiology, Faculty of Kinesiology and Rehabilitation Sciences, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | | | | | | |
Collapse
|