My Blog List

Thursday, September 8, 2016

FRAUDAR: Bounding Graph Fraud in the Face of Camouflage, Bryan Hooi et al.


On Tuesday, I attended a talk held by School of Computer Science at CMU. The talk belongs to Artificial Intelligence Lunch Seminar series. It is given by Bryan Hooi who is a PhD student in CMU. He introduced their group’s recent work about detecting fraudsters in the face of camouflage in shopping websites or social networks. Their paper winned the ACM SIGKDD best paper award at KDD 2016 conference, a top conference in Data Mining and Data Science. In this blog, I would like to introduce their work.

As the rising popularity of social networks like Twitter, online shopping website Amazons and crowd-sourced review website Yelp, fraudsters start to emerge to manipulate these services. For example, some politicians might purchase fake followers by very large-scale size to promote their popularity; Businesses buy fake 5-star reviews to improve their product’s reputation and mislead customers. Those fake followers and fake review-writers are called “fraudulent users”, and those fraudulent followees and products are referred as “fraudulent customers”. The task of fraud detection contains identifying both of them.

Given a bipartite graph of users and products, or followers and followees, there would exist a dense subgraph composed of fraudulent users and customers, because they have to build many edges. Such densely-connected subgraph woule create a dense region in the adjacency matrix of the bipartite graph (shown in Figure 1). At the same time, fraudsters also employ some strategies to “look normal”, by adding edges to other non-customers. This behavior is called “camouflage”. Therefore, an effective fraud detection method should be camouflage-resistant.
Figure 1. Fraudsters and customers would create dense region in the adjacency matrix. To camouflage themselves, fraudulent users tend to add links to honest objects, by randomly choosing targets, biased to objects with high degrees, or using hijacked accounts from honest users.
Bryan and his colleagues propose a novel family of density metrics to detect fraudsters and present an effective algorithm named FRAUDAR (Please refer to the original paper for details.). Their algorithm FRAUDAR has many good properties – (a) it is camouflage-resistant, i.e. the outcome won’t be influenced by camouflage behaviors, (b) it provides a provable upper bound regarding how many fake edges can exist without being detected, and (c) it is near-linearly scalable which is applicable to large-scale real networks. The performance of FRAUDAR is shown in Figure 2.
Figure 2. FRAUDAR’s performance. (left) By comparing FRAUDAR with the state-of-art algorithm SpokEn[1] in three settings – none, random camouflage and biased camouflage, we can see that FRAUDAR is able to achieve high prediction accuracy even in the present of camouflage. (right) It respectively shows the prediction accuracy of FRAUDAR in spotting fraudulent users as well as in spotting fraudulent products.

[1] B. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and  community structure in large graphs. PAKDD, 2010a, 84, 2010.

Details about talk

Title: FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
Speaker: Bryan Hooi

No comments:

Post a Comment