On Tuesday, I attended a talk held by School of Computer Science at CMU. The talk belongs to Artificial Intelligence Lunch Seminar series. It is given by Bryan Hooi who is a PhD student in CMU. He introduced their group’s recent work about detecting fraudsters in the face of camouflage in shopping websites or social networks. Their paper winned the ACM SIGKDD best paper award at KDD 2016 conference, a top conference in Data Mining and Data Science. In this blog, I would like to introduce their work.
As the rising popularity of social networks like Twitter, online shopping website Amazons and crowd-sourced review website Yelp, fraudsters start to emerge to manipulate these services. For example, some politicians might purchase fake followers by very large-scale size to promote their popularity; Businesses buy fake 5-star reviews to improve their product’s reputation and mislead customers. Those fake followers and fake review-writers are called “fraudulent users”, and those fraudulent followees and products are referred as “fraudulent customers”. The task of fraud detection contains identifying both of them.
Given a bipartite graph of users and products, or followers and followees, there would exist a dense subgraph composed of fraudulent users and customers, because they have to build many edges. Such densely-connected subgraph woule create a dense region in the adjacency matrix of the bipartite graph (shown in Figure 1). At the same time, fraudsters also employ some strategies to “look normal”, by adding edges to other non-customers. This behavior is called “camouflage”. Therefore, an effective fraud detection method should be camouflage-resistant.
Bryan and his colleagues propose a novel family of density metrics to detect fraudsters and present an effective algorithm named FRAUDAR (Please refer to the original paper for details.). Their algorithm FRAUDAR has many good properties – (a) it is camouflage-resistant, i.e. the outcome won’t be influenced by camouflage behaviors, (b) it provides a provable upper bound regarding how many fake edges can exist without being detected, and (c) it is near-linearly scalable which is applicable to large-scale real networks. The performance of FRAUDAR is shown in Figure 2.
 B. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and community structure in large graphs. PAKDD, 2010a, 84, 2010.
Details about talk
Title: FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
Speaker: Bryan Hooi
Paper URL: http://www.kdd.org/kdd2016/subtopic/view/fraudar-bounding-graph-fraud-in-the-face-of-camouflage