My Blog List

Tuesday, February 7, 2017

Week 6: A graph-based recommendation across heterogeneous domains

Background: Cross-domain recommendation has attracted lots of attention during past years. For example, we may want to recommend tagged movies from IMDb to users in Facebook with barely no shared tags with IMDb. The challenges lie in that there are no shared users or items among different domains, and furthermore, no common feature spaces can be used to build a bridge between their gaps. Figure 1 shows tag clouds being used in two websites - Douban (movie rating website) and Weibo (Twitter-like social networking site in China). It can be seen that two groups of users are using two distinct set of tags/worlds. Therefore, this papers tries to construct a connection between heterogeneous domains, and design an effective model to infer the similarity of target users and recommended items.

Figure 1. Tag clouds for (a) Douban and (b) Weibo.
Methods: In this paper, a graph-based approach is proposed to address this issue of cross-domain recommendation. Specifically, they employ bipartite graph to represent within-domain relationships between entities, such as user - feature relationship and item - feature relationship. As shown in Figure 2, the red and blue edges are within-domain connections in domain 1 and domain 2. The feature sets of various domains are different and distinct. To bridge the feature gap, they utilize online encyclopedias - Wikipedia (English) and Baike (Chinese) to perform semantic matching of tags. In Wikipedia, there are millions of hyper-links among concepts and such information can provide information how correlated two tags are. Each tag is represented by a concept vector, and in such vector each element is the tf-itf score of the tag's occurrence in the concept article. In Figure 2, the semantic relationships are indicated by green edges between domain 1 and domain 2. Given such composite multi-partite graph, they further implement a propagation algorithm to infer the global similarity between target users and items to be recommended.
Figure 2. Multi-partite graph across two domains.
Experiments: They use two kinds of datasets to perform experiments. The first one is Douban-Weibo data. Douban is a movie rating website and Weibo is a popular social networking site in China. They hire some volunteers to assess recommendation results in order to generate ground truth. The second data is Diabetes data. Diabetes I (II) is dedicated to diabetes patients of Type I (II). They determine to recommend discussion threads from domain II to patients in domain I. Due to the lack of ground truth, they generate a set of "ground truth" by picking out high-similarity threads with patients in domain I. Baselines consist of random recommendation, lexical matching, explicit semantic analysis matching and google distance. By studying matrices such as precision at rank k, average precision at rank k and nDCG, they show that their graph-based method achieves the most satisfying performance in cross-domain recommending.

Figure 3. Human assessment results for Douban-Weibo data.

Figure 4. Performance of all methods in Diabetes data.

No comments:

Post a Comment