Summary
On Tuesday, I attended a talk held by School of
Computer Science at CMU. The talk belongs to Artificial Intelligence Lunch
Seminar series. It is given by Bryan Hooi who is a PhD student in CMU. He
introduced their group’s recent work about detecting fraudsters in the face of
camouflage in shopping websites or social networks. Their paper winned the ACM
SIGKDD best paper award at KDD 2016 conference, a top conference in Data Mining
and Data Science. In this blog, I would like to introduce their work.
As the rising popularity of social networks like
Twitter, online shopping website Amazons and crowd-sourced review website Yelp,
fraudsters start to emerge to manipulate these services. For example, some politicians
might purchase fake followers by very large-scale size to promote their
popularity; Businesses buy fake 5-star reviews to improve their product’s
reputation and mislead customers. Those fake followers and fake review-writers
are called “fraudulent users”, and those fraudulent followees and products are
referred as “fraudulent customers”. The task of fraud detection contains
identifying both of them.
Given a bipartite graph of users and products,
or followers and followees, there would exist a dense subgraph composed of
fraudulent users and customers, because they have to build many edges. Such
densely-connected subgraph woule create a dense region in the adjacency matrix
of the bipartite graph (shown in Figure 1). At the same time, fraudsters also
employ some strategies to “look normal”, by adding edges to other
non-customers. This behavior is called “camouflage”. Therefore, an effective fraud
detection method should be camouflage-resistant.
Bryan and his colleagues propose a novel family
of density metrics to detect fraudsters and present an effective algorithm
named FRAUDAR (Please refer to the original paper for details.). Their
algorithm FRAUDAR has many good properties – (a) it is camouflage-resistant, i.e.
the outcome won’t be influenced by camouflage behaviors, (b) it provides a provable
upper bound regarding how many fake edges can exist without being detected, and
(c) it is near-linearly scalable which is applicable to large-scale real
networks. The performance of FRAUDAR is shown in Figure 2.
[1] B. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and community structure in large graphs. PAKDD, 2010a, 84, 2010.
Details about talk
Title: FRAUDAR: Bounding Graph Fraud in the Face
of Camouflage
Speaker: Bryan Hooi
No comments:
Post a Comment