1
|
Wu C, Lian D, Ge Y, Zhu Z, Chen E. Influence-Driven Data Poisoning for Robust Recommender Systems. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11915-11931. [PMID: 37163407 DOI: 10.1109/tpami.2023.3274759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Recent studies have shown that recommender systems are vulnerable, and it is easy for attackers to inject well-designed malicious profiles into the system, resulting in biased recommendations. We cannot deprive these data's injection right and deny their existence's rationality, making it imperative to study recommendation robustness. Despite impressive emerging work, threat assessment of the bi-level poisoning problem and the imperceptibility of poisoning users remain key challenges to be solved. To this end, we propose Infmix, an efficient poisoning attack strategy. Specifically, Infmix consists of an influence-based threat estimator and a user generator, Usermix. First, the influence-based estimator can efficiently evaluate the user's harm to the recommender system without retraining, which is challenging for existing attacks. Second, Usermix, a distribution-agnostic generator, can generate unnoticeable fake data even with a few known users. Under the guidance of the threat estimator, Infmix can select the users with large attacking impacts from the quasi-real candidates generated by Usermix. Extensive experiments demonstrate Infmix's superiority by attacking six recommendation systems with four real datasets. Additionally, we propose a novel defense strategy, adversarial poisoning training (APT). It mimics the poisoning process by injecting fake users (ERM users) committed to minimizing empirical risk to build a robust system. Similar to Infmix, we also utilize the influence function to solve the bi-level optimization challenge of generating ERM users. Although the idea of "fighting fire with fire" in APT seems counterintuitive, we prove its effectiveness in improving recommendation robustness through theoretical analysis and empirical experiments.
Collapse
|
2
|
Matias JN. Influencing recommendation algorithms to reduce the spread of unreliable news by encouraging humans to fact-check articles, in a field experiment. Sci Rep 2023; 13:11715. [PMID: 37474541 PMCID: PMC10359256 DOI: 10.1038/s41598-023-38277-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 07/06/2023] [Indexed: 07/22/2023] Open
Abstract
Society often relies on social algorithms that adapt to human behavior. Yet scientists struggle to generalize the combined behavior of mutually-adapting humans and algorithms. This scientific challenge is a governance problem when algorithms amplify human responses to falsehoods. Could attempts to influence humans have second-order effects on algorithms? Using a large-scale field experiment, I test if influencing readers to fact-check unreliable sources causes news aggregation algorithms to promote or lessen the visibility of those sources. Interventions encouraged readers to fact-check articles or fact-check and provide votes to the algorithm. Across 1104 discussions, these encouragements increased human fact-checking and reduced vote scores on average. The fact-checking condition also caused the algorithm to reduce the promotion of articles over time by as much as -25 rank positions on average, enough to remove an article from the front page. Overall, this study offers a path for the science of human-algorithm behavior by experimentally demonstrating how influencing collective human behavior can also influence algorithm behavior.
Collapse
Affiliation(s)
- J Nathan Matias
- Department of Communication, Cornell University, Ithaca, NY, USA.
- Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA, USA.
| |
Collapse
|
3
|
Gulsoy M, Yalcin E, Bilge A. Robustness of privacy-preserving collaborative recommenders against popularity bias problem. PeerJ Comput Sci 2023; 9:e1438. [PMID: 37547423 PMCID: PMC10403214 DOI: 10.7717/peerj-cs.1438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/24/2023] [Indexed: 08/08/2023]
Abstract
Recommender systems have become increasingly important in today's digital age, but they are not without their challenges. One of the most significant challenges is that users are not always willing to share their preferences due to privacy concerns, yet they still require decent recommendations. Privacy-preserving collaborative recommenders remedy such concerns by letting users set their privacy preferences before submitting to the recommendation provider. Another recently discussed challenge is the problem of popularity bias, where the system tends to recommend popular items more often than less popular ones, limiting the diversity of recommendations and preventing users from discovering new and interesting items. In this article, we comprehensively analyze the randomized perturbation-based data disguising procedure of privacy-preserving collaborative recommender algorithms against the popularity bias problem. For this purpose, we construct user personas of varying privacy protection levels and scrutinize the performance of ten recommendation algorithms on these user personas regarding the accuracy and beyond-accuracy perspectives. We also investigate how well-known popularity-debiasing strategies combat the issue in privacy-preserving environments. In experiments, we employ three well-known real-world datasets. The key findings of our analysis reveal that privacy-sensitive users receive unbiased and fairer recommendations that are qualified in diversity, novelty, and catalogue coverage perspectives in exchange for tolerable sacrifice from accuracy. Also, prominent popularity-debiasing strategies fall considerably short as provided privacy level improves.
Collapse
Affiliation(s)
- Mert Gulsoy
- Distance Education Research Center, Alaaddin Keykubat University, Antalya, Turkey
- Computer Engineering Department, Akdeniz University, Antalya, Turkey
| | - Emre Yalcin
- Computer Engineering Department, Sivas Cumhuriyet University, Sivas, Turkey
| | - Alper Bilge
- Computer Engineering Department, Akdeniz University, Antalya, Turkey
| |
Collapse
|
4
|
Wu X, Liao H. Managing uncertain preferences of consumers in product ranking by probabilistic linguistic preference relations. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
5
|
Nguyen TT, Quach KND, Nguyen TT, Huynh TT, Vu VH, Le Nguyen P, Jo J, Nguyen QVH. Poisoning GNN-based Recommender Systems with Generative Surrogate-based Attacks. ACM T INFORM SYST 2022. [DOI: 10.1145/3567420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
With recent advancements in graph neural networks (GNN), GNN-based recommender systems (gRS) have achieved remarkable success in the past few years. Despite this success, existing research reveals that gRSs are still vulnerable to
poison attacks
, in which the attackers inject fake data to manipulate recommendation results as they desire. This might be due to the fact that existing poison attacks (and countermeasures) are either model-agnostic or specifically designed for traditional recommender algorithms (e.g., neighbourhood-based, matrix-factorisation-based, or deep-learning-based RSs) that are not gRS. As gRSs are widely adopted in the industry, the problem of how to design poison attacks for gRSs has become a need for robust user experience. Herein, we focus on the use of poison attacks to manipulate item promotion in gRSs. Compared to standard GNNs, attacking gRSs is more challenging due to the heterogeneity of network structure and the entanglement between users and items. To overcome such challenges, we propose
GSPAttack
– a generative surrogate-based poison attack framework for gRSs.
GSPAttack
tailors a learning process to surrogate a recommendation model as well as generate fake users and user-item interactions while preserving the data correlation between users and items for recommendation accuracy. Although maintaining high accuracy for other items rather than the target item seems counterintuitive, it is equally crucial to the success of a poison attack. Extensive evaluations on four real-world datasets revealed that
GSPAttack
outperforms all baselines with competent recommendation performance and is resistant to various countermeasures.
Collapse
Affiliation(s)
| | | | - Thanh Tam Nguyen
- Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam
| | | | - Viet Hung Vu
- Hanoi University of Science and Technology, Vietnam
| | | | - Jun Jo
- Griffith University, Australia
| | | |
Collapse
|
6
|
Detecting shilling groups in online recommender systems based on graph convolutional network. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Dong M, Yuan F, Yao L, Wang X, Xu X, Zhu L. A survey for trust-aware recommender systems: A deep learning perspective. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Himeur Y, Sohail SS, Bensaali F, Amira A, Alazab M. Latest trends of security and privacy in recommender systems: A comprehensive review and future perspectives. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102746] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
|
10
|
Aiolli F, Conti M, Picek S, Polato M. On the feasibility of crawling-based attacks against recommender systems. JOURNAL OF COMPUTER SECURITY 2021. [DOI: 10.3233/jcs-210041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Nowadays, online services, like e-commerce or streaming services, provide a personalized user experience through recommender systems. Recommender systems are built upon a vast amount of data about users/items acquired by the services. Such knowledge represents an invaluable resource. However, commonly, part of this knowledge is public and can be easily accessed via the Internet. Unfortunately, that same knowledge can be leveraged by competitors or malicious users. The literature offers a large number of works concerning attacks on recommender systems, but most of them assume that the attacker can easily access the full rating matrix. In practice, this is never the case. The only way to access the rating matrix is by gathering the ratings (e.g., reviews) by crawling the service’s website. Crawling a website has a cost in terms of time and resources. What is more, the targeted website can employ defensive measures to detect automatic scraping. In this paper, we assess the impact of a series of attacks on recommender systems. Our analysis aims to set up the most realistic scenarios considering both the possibilities and the potential attacker’s limitations. In particular, we assess the impact of different crawling approaches when attacking a recommendation service. From the collected information, we mount various profile injection attacks. We measure the value of the collected knowledge through the identification of the most similar user/item. Our empirical results show that while crawling can indeed bring knowledge to the attacker (up to 65% of neighborhood reconstruction on a mid-size dataset and up to 90% on a small-size dataset), this will not be enough to mount a successful shilling attack in practice.
Collapse
Affiliation(s)
- Fabio Aiolli
- Department of Mathematics, University of Padova, Padova, Italy. E-mails: , ,
| | - Mauro Conti
- Department of Mathematics, University of Padova, Padova, Italy. E-mails: , ,
| | - Stjepan Picek
- Department of Intelligent Systems, Delft University of Technology, Delft, The Netherlands. E-mail:
| | - Mirko Polato
- Department of Mathematics, University of Padova, Padova, Italy. E-mails: , ,
| |
Collapse
|
11
|
|
12
|
Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Collapse
|
13
|
Detection of shilling attack in recommender system for YouTube video statistics using machine learning techniques. Soft comput 2021. [DOI: 10.1007/s00500-021-05586-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Rezaimehr F, Dadkhah C. A survey of attack detection approaches in collaborative filtering recommender systems. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09898-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|