My Blog List

Tuesday, April 25, 2017

Reading 15: User-Driven System-Mediated Collaborative Information Retrieval

Motivation: Collaborative information retrieval (CIR) involves more than one information seekers in a searching process. Different from individual information retrieval (IIR), CIR needs to take into account users' skills and preferences, also to support their communication and cooperations. Existing approaches can be generally divided into two categories. One is user-based CIR system which allows users to decide role assignments and supports their communication in searching process; the other one is system-based CIR which system would impose roles to users and optimizes information retrieval. Both of them have limitations and merits. This paper tries to combine them together, and proposes a user-driven system-mediated CIR system.

Approach: In the user-driven system-mediated CIR, users' searching behaviors would be monitored and analyzed; it would discover significant behavioral differences between each pair of users; finally it suggests roles to users in order to leverage the best of users' skills and preferences. Take the pair of roles "Gatherer versus Surveyor" as an example. In a collaborative searching process, gatherer would be more likely to seek highly relevant documents, so that he or she would perform queries with much overlap, and spend more time reading webpage contents; whereas, surveyor would tend to explore new things on websites, he or she would try different queries, spend less them on each webpage and accordingly, the query success is low. By analyzing such features, i.e. query overlap, dwell time and query success, we can obtain insights into user's role difference.

Experiments: They ask participants (students in Rutgers University) to collaboratively write a report on an exploratory topic. User study #1 has the topic of "Gulf oil spill" and user study #2 focuses on the topic of "Global warming". In the searching session, supportive chat system and search tools enable bookmarking webpages and saving snippets. Specifically, three types of features are considered (shown in Table 1).
Table 1. Features used to describe a searching session.
By analyzing users' searching behaviors, they found that significant behavioral difference became obvious after a short period of time as a session started. Figure 1 reveals this observation as p values are very significant. Besides, during the same session, users do not change their roles.
Figure 1. Significant difference in users' search behaviors.
Finally, they compare the effect of role mining on information retrieval in a CIR task. Table 2 shows the comparison with four baselines and report the average increase that RB-CIR has obtained. From it we observe that, RB-CIR outperforms all other methods except for PM-CIR. They argued that since the difference between RB-CIR and PM-CIR is not significant, it's still reasonable to say that RB-CIR has better performance.
Table 2. Comparison of RB-CIR with four baselines.
One limitation in their proposed method is the lacking of prior knowledge of users' skills as roles are mined from current searching behaviors. Therefore, in the further they would consider users' prior searching behaviors and preferences to complement current actions.

Thursday, April 20, 2017

Reading 14: Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS)

Motivation: In recommendation and review systems, users might not only provide overall ratings but also write more informative reviews. In reviews, users usually express their sentiments, discuss the aspects they like or do not like, and provide interpretations about their ratings. It is necessary to exploit such implicit comments to boost recommendation performance. For such purpose, this paper determines to model ratings and sentiments in comments in a per-aspect way, and finally they propose a probabilistic model based on topic modeling and collaborative filtering that holds superior performance.
Fig. 1 Rating and review model in a per-aspect way. It consists of two parts: modeling ratings and modeling reviews.
Approaches: Given a user and a movie, their task is to predict (i) the observed rating as well as (ii) the review. (i) In terms of ratings, they assume that any observed overall rating is generated from individual aspect-specific ratings, including rating for actors, rating for plots and rating for visual effect etc. Different aspects (actors, plot, visual effect) may hold different importance weights. Larger value implies that user has an interest in such aspect and the movie also highlights such aspect. Simply speaking, high overall rating is generated through a good matching between user's expectations and movie's quality in those aspects that user cares most. (ii) When writing a review, the user might talk about movie specific contents, and also express aspect specific judgements (sentiments). To describe such variety, they assume that the review language model contains five components:
1. A background language component;
2. A background sentiment language component;
3. A movie-specific language component;
4. An aspect-specific language component;
5. An aspect-specific sentiment language component.

Fig.1 illustrate the entire framework of their probabilistic model. User's interest and movie's relevance are collectively used to model the final rating and review contents. Their model is called "JMARS".

Experiments: The experiment is conducted on a dataset collected from IMDb - a famous movie review website. In total, 50k movies along with their reviews are crawled. They use 80% of data as training data, 10% as validation and 10% for testing. Fig. 2 shows comparison of JMARS in terms of perplexity. When factor size is set as 5 or 10 (5 or 10 aspects are considered in movie), JMARS could always outperforms HFT approach. Besides, Fig. 3 reveals MSE comparison which also proves JMARS good performance.

Fig. 2 Comparison of models using perplexity.
Fig. 3 Comparison of models in terms of MSE.

Tuesday, April 11, 2017

Week 13: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Title: Visual-Textual Joint Relevance Learning for Tag-based Social Image Search, IEEE Transactions on Image Processing, 2013.

Motivation: With the development of social media, a new type of search has become increasingly popular - social search, and this paper particularly focuses on image search in social media. Given a tag query, for example "apple", a good search engine is able to output a set of highly relevant but also diverse images showing fruit apples, cellphone and MacBook. Tag-based social image search always leverages user-generated tags to calculate image's relevance score, however, such tags contain too much noises and it's difficult to form an optimal ranking strategy. Therefore, this paper seeks to simultaneously utilizes tags as well as visual information for image relevance learning.
Fig 1. Framework of the proposed visual-textual joint relevance learning method.
Method: The basic framework can be seen in Fig 1. (I) Given a set of images, each of them can be represented by two kinds of features - visual as well as textual features. (II) Based on such two types of representations, a hypergraph can be constructed. Here it should be highlighted that hypergraph is different from the general graph. In a hypergraph, edge is called "hyperedge", which does not represent pairwise interactions, while it is a relationship consisting of a set of images. To be specific, the nodes sharing the same tags can be "linked" by a textual-based hyperedge, and the images sharing the same visual "words" can be connected by a visual-based hyperedge. Fig 2. reveals textual-based hyperedges (left) and visual-based hyperedges (right). (III) A joint image relevance learning process is performed on a set of pseudo-relevant samples. Pseudo-relevant samples are actually labeled images collected based on tags. Then they propose an objective function aiming to learn a relevance vector f with each element indicating an image's relevance score. (IV) Based on the learned relevance score, the algorithm will return top-K images to users.

Fig 2. Examples of hyperedge construction. The left figure shows textual-based hyperedges and the right one shows visual-based hyperedges.
Experiment: They perform experiments on Flickr Image Dataset (104,000 images, 83,999 tags), and compare the proposed textual-visual joint hypergraph learning approach (HG-WE-joint) with five stat-of-the-art baselines, including graph-based semi-supervised learning, sequential social image relevance learning, tag ranking, tag relevance combination, hypergraph-based relevance learning with equal weight (HG). It also examines the performance of the proposed approach with only single information, HG-WE-visual or HG-WE-textual. Results show that HG-WE-joint outperforms all baselines and maintains good robustness to parameters. However, HG-WE-joint requires the highest computational cost to achieve the best retrieval performance.

Wednesday, April 5, 2017

Talk summary 2: Modeling Sequential Decision Making in Team Sports using Imitation Learning

Dr. Peter Carr is working as a research scientist at Disney Research, Pittsburgh. In this talk, he introduced their recent work about modeling sequential decision making in sports using imitation learning techniques. I would like to summarize his talk from the following three critical points.

What is the objective of their work?

In terms of team sports analysis, researchers try to compare the performance of a specific teams or players with that of a typical team in a professional league, i.e. average league performance. To be able to quantitatively study players' movement patterns, it requires existence of player tracing dataset. Fortunately, with the advance of data collection techniques in recent years, it becomes possible that people gather spatiotemporal sport data by tracing players' movements in a large number of games. In the talk, Dr. Peter Carr mentioned that they have used approximately 100 games of player tracking data from a professional soccer league for modeling sequential decision making.

With such dataset, they are interested in modeling defensive situations - what players would do within under the situation where the opposition had control of the ball. They explore what a defensive player should have done, based on the average league performance revealed by data, in comparison with what they actually did. This kind of work help us better understand the overall defensive strategies of a league as well as how a certain team would play differently. In this framework, the "should-have-done" motions are learned from player tracking data through "data-driven ghosting" method. In next section, we will summarize the high-level intuition of the method.

What is the method?

Data-driven ghosting is implemented based on imitation learning. Imitation learning, also called "learning from demonstration", is a process that computer automatically learns strategies by observing expert behaviors. It is similar to what human would do in learning process - a person who has no knowledge of sports, can understand what to do if he/she has observed a sufficient number of games.

Figure 1. Deep multi-agent imitation learning framework. Single player learning (Upper) and multi-agent learning (Bottom).
Their task is to predict the action of a player at each time step given the state feature, which actually is a online sequence decision making problem. Besides, they need to predict actions for multiple players at the same time. Therefore, they proposed a deep multi-agent imitation learning framework (figure 1). Two major components are presented - single player modeling and joint training of multiple players. In stage 1, the algorithm learns a model for each player to predict average league action, and in stage 2, these pre-trained models learned in stage 1 would be used in stage 2 for joint training of multiple agents. In both stages, training and prediction are combined together, so that a model can learn from their prediction mistakes to go back to "right" track (see in figure 1).

How about the results?

An example is shown in figure 2. The data-driven ghosting players (white) and their trajectories (white) represent average league movements; colored dots and trajectories represent actual movements in games. Results have revealed that the proposed model can generate a sequence of behaviors showing spatial and formational awareness. More information can be viewed in the video:
Figure 2. Ghosting behaviors (white) in comparison of actual movements.

[1] Le, Hoang M., et al. "Data-Driven Ghosting using Deep Imitation Learning." (2017).

Week 12: FolkTrails: Interpreting Navigation Behavior in a Social Tagging System

Motivation: Social tagging systems have been widely used to organize and store online information, such as webpages and publications (Delicious and Bibsonomy). In such systems, users can freely assign keywords to specific resources for retrieval and organization in the future. By tracing users' behaviors in those tagging systems, researchers are able to understand how they assign tags, navigate among resources and consume information. This paper focuses on the specific problem of interpreting human navigation behaviors within Bibsonomy, and exploring the behavioral differences between different user subgroups.

Methods: It formulates a set of hypotheses which can be encoded as transition probability, and then apply HypTrails [1], a framework for comparing hypotheses based on empirical observations, to figure out which hypotheses better capture the intrinsic mechanism of human navigation trails. Figure 1 shows an example of user reviews for restaurants in Italy on Yelp. To be specific, HypTrails defines a navigation trail as a first-order Markov chain over a sequence of states, so that each hypothesis can be formulated as transition probabilities jumping from one state to another. To conclude which hypothesis better reflect empirical observations, it leverages Bayes factor, which compares the marginal likelihood P(D|H) where D represents real observations and H indicates hypothesis.

Figure 1. (a-c) show three hypotheses about human trails - uniform hypothesis, geo hypothesis and self-loop hypothesis, (d) reveals the empirical observation.
In this paper, six basic hypotheses are formulated:

  1. Uniform Hypothesis - user randomly select a page for next visit
  2. Page Consistent Hypothesis - user visits the same page in next step (due to pagination)
  3. Category Consistent Hypothesis - user visits page under the same category as current page
  4. User Consistent Hypothesis - a transition's target and source belong to the same user
  5. Folksonomy Consistent Hypothesis - user goes to a page by following folksonomy structure
  6. Semantic Navigation Hypothesis - user goes to a page which maintains semantic relations with current one

Three combined hypotheses are introduced:

  1. Folksonomy Consistent & Semantic Navigation Hypothesis
  2. User Consistent & Semantic Navigation Hypothesis
  3. User Consistent & Folksonomy Navigation Hypothesis

Experiment results: The above mentioned hypotheses are tested based on empirical dataset collected from the social tagging system Bibsonomy. Results show that (i) in overall, the combination of user consistent & semantic navigation hypothesis works best, and the second one is user consistent hypothesis; (ii) users show different browsing behaviors between within his/her resources and outside his/her resources - within navigations tend to be explained by semantic navigation hypothesis, while outside behaviors are likely to be following folksonomy structure; (iii) short-term and long-term users show different behaviors - short-term users are likely to follow semantic navigation while long-term users are prone to follow folksonomy structure.

[1]  Singer, P., Helic, D., Hotho, A., Strohmaier, M.: HypTrails: A bayesian approach for comparing hypotheses about human trails on the Web. In: Proc. of the 24th WWW Conf. (2015) 

Wednesday, March 22, 2017

Week 10: Exploiting User Feedback to Learn to Rank Answers in Q&A Forums: a Case Study with Stack Overflow

Title: Exploiting User Feedback to Learn to Rank Answers in Q&A Forums: a Case Study with Stack Overflow, SIGIR 2013

Motivation: Collaborative websites, such as Q&A platforms, are characterized by a loose edit control, allowing users to freely edit their questions and answers. To help distinguish or rank answers, many websites incorporate functions like asker selecting the best answers, and users commenting or vote for qualified answers etc. However, such manual assessment might not scale up to the increasing volume of data, and tends to be subject to personal biases. Therefore, this paper proposes to create an automated (or semi-automated) assessment mechanism to rank answers based on their quality features.

Methods: They apply point-wise learning to rank (L2R) approach using random forests for ranking answers in Stack Overflow. In this model, they construct a large set of features from eight groups. The eight groups are user features, user graph features, review features, structure features, length features, style features, readability features and relevance features. User and user graph features describe user's activities, reputations and influence in asking-answering graphs. They introduce review features into this model because of the intuition that a content received many edits tends to be improved towards high quality. The structure, length, style and readability features capture answers' properties from different perspectives. The relevance features describe how relevant the answer is to a specific question.

Figure 1. NDCG@k for RF, RF-BaseFeatures and other methods. The RF method is proposed by this paper, and RF-BaseFeatures only use features that have already been proposed in prior works.

Experiments: They apply the L2R model using Random Forest in Stack Overflow dataset, and find that (1) this proposed method outperforms all competing baselines regarding NDCG@k evaluation; Besides, (2) user and review features are the most significant groups of features than others.

Tuesday, February 28, 2017

Week 9: Efficient Latent Link Recommendation in Signed Networks

Motivation: Link prediction or recommendation is very popular in today's online services, such as recommending potential friends in Facebook, suggesting qualified employers in LinkedIn and recommending research colleagues in ResearchGate. Link recommendation is actually a personalized ranking problem - producing a list of ranked people to target user. Signed networks, e.g. Epinions or Slashdot, allows people to indicate whether they trust (positive) or distrust (negative) each other, so that the plausible recommendation result would be a list with positive relationships on the top and negative links at the bottom. Traditional evaluation measures - AUC - is no longer suitable for the case where there exist three types of relationships: positive, negative and unknown. A recently proposed generalized AUC (GAUC), although cares about both the head and tail of rank list, does not require that the head links are all positive or the tail links are all negative. Therefore, this paper starts from GAUC, deriving two lower bounds for GAUC which can be computed in linear time, and further develop two probabilistic models to infer personalized list of latent links.

Approach: In AUC, both negative and latent links are considered as negative. AUC compares each pair of positive and negative links, and calculates the fraction of pairs within which the positive one has higher rank than the negative one. Therefore, the perfect rank list (positive on top, negative at bottom) has AUC = 1, where the worst rank list (negative on top, positive at bottom) has AUC = 0. There are two major disadvantages in AUC: (i) it is designed for binary case, while not suitable for cases of three classes; (ii) the computation is slow as it requires to compare each pair of links. This paper presents two lower bounds - bound I and bound II. Bound I compares each positive link with that link from latent and negative classes holding the maximum ranking score, and meantime compare each negative link with the link from positive and latent classes holding the minimum ranking score. Bound II is a stricter bound. It compares the minimum ranking score in positive class with the maximum score from negative and latent classes, meanwhile compares the maximum ranking score in negative class with the minimum one from positive and latent classes. Figure 1 shows the values of AUC, GAUC, GAUC-bound-I and GAUC-bound-II for 4 ranking list.

With the goal of optimizing the above two bounds, this paper further proposes two probabilistic models for link recommendation - efficient latent link recommendation-I (ELLR-I) and efficient latent link recommendation-II (ELLR-II). Given a partially observed matrix X, they want to learn two low-rank matrices U and V such that the ranking lists can be optimized in the sense of lower bounds for GAUC. They employ Bayesian model aiming to maximize the following posterior distribution:
where > f denotes the orderings obtained by X.

Experiments: They compare their algorithms ELLR-I and ELLR-II with the state-of-the-art baselines, including network topology-based method common neighbor (CN), point-wise approach matrix factorization (MF), pairwise methods maximum margin matrix factorization (MMMF), Bayesian personalized ranking based matrix factorization (BPR+MF),  list-wise algorithms List-MF, as well as GAUC-optimization-based method OPT-GAUC. Four signed graphs are used, including Wikipedia, Slashdot, Epinions and MovieLens10M. Results reveal that ELLR-I and ELLR-II outperforms those top-performed competing methods in signed networks.