1
|
Zhao J, Huang JX, Deng H, Chang Y, Xia L. Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search Personalization. ACM T INFORM SYST 2022. [DOI: 10.1145/3476106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
In this article, we propose a Latent Dirichlet Allocation– (LDA) based topic-graph probabilistic personalization model for Web search. This model represents a user graph in a latent topic graph and simultaneously estimates the probabilities that the user is interested in the topics, as well as the probabilities that the user is not interested in the topics. For a given query issued by the user, the webpages that have higher relevancy to the interested topics are promoted, and the webpages more relevant to the non-interesting topics are penalized. In particular, we simulate a user’s search intent by building two profiles: A positive user profile for the probabilities of the user is interested in the topics and a corresponding negative user profile for the probabilities of being not interested in the the topics. The profiles are estimated based on the user’s search logs. A clicked webpage is assumed to include interesting topics. A skipped (viewed but not clicked) webpage is assumed to cover some non-interesting topics to the user. Such estimations are performed in the latent topic space generated by LDA. Moreover, a new approach is proposed to estimate the correlation between a given query and the user’s search history so as to determine how much personalization should be considered for the query. We compare our proposed models with several strong baselines including state-of-the-art personalization approaches. Experiments conducted on a large-scale real user search log collection illustrate the effectiveness of the proposed models.
Collapse
Affiliation(s)
- Jiashu Zhao
- Department of Physic and Computer Science, Wilfrid Laurier University, Waterloo, Canada
| | - Jimmy Xiangji Huang
- Information Retrieval and Knowledge Management Lab, York University, Toronto, Canada
| | | | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Long Xia
- Information Retrieval & Knowledge Management Research Lab, York University, Toronto, Canada
| |
Collapse
|
2
|
Zhao J, Huang JX, Ye Z. Modeling Term Associations for Probabilistic Information Retrieval. ACM T INFORM SYST 2014. [DOI: 10.1145/2590988] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Traditionally, in many probabilistic retrieval models, query terms are assumed to be independent. Although such models can achieve reasonably good performance, associations can exist among terms from a human being’s point of view. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this article, we introduce a new concept
cross term
, to model term proximity, with the aim of boosting retrieval performance. With cross terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query-term impact gradually weakens with increasing distance from the place of occurrence. We use shape functions to characterize such impacts. Based on this assumption, we first propose a bigram CRoss TErm Retrieval (
CRTER
2
) model as the basis model, and then recursively propose a generalized n-gram CRoss TErm Retrieval (
CRTER
n
) model for n query terms, where
n
> 2. Specifically, a bigram cross term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. For an n-gram cross term, we develop several distance metrics with different properties and employ them in the proposed models for ranking. We also show how to extend the language model using the newly proposed cross terms. Extensive experiments on a number of TREC collections demonstrate the effectiveness of our proposed models.
Collapse
Affiliation(s)
- Jiashu Zhao
- Information Retrieval and Knowledge Management Research Lab, York University
| | - Jimmy Xiangji Huang
- Information Retrieval and Knowledge Management Research Lab, York University
| | - Zheng Ye
- Information Retrieval and Knowledge Management Research Lab, York University
| |
Collapse
|