My Blog List

Thursday, September 8, 2016

Web as a textbook: Curating Targeted Learning Paths through the Heterogeneous Learning Resources on the Web. Igor Labutov and Hod Lipson


This week I attended a very interesting talk at iSchool of Pittsburgh, given by Igor Labutov who is currently a postdoc in CMU in Machine Learning Department. The topic is about how to curate webpages into an organized way to help people better understand unfamiliar concepts. In this talk, Labutov indicated that more and more people would like to search for technical and academic resources using online search engines like Google, Bing and Yahoo. However, the retrieved webpages might have various degree of details and hold various assumptions that audience have mastered certain prerequisite knowledge. Therefore, we all have such frustrating experience that when you are searching for tutorials or lectures to understand one concept, you find yourselves encountering more and more unfamiliar concepts. To deal with this issue, Labutov and Lipson propose a task: “organizing heterogeneous educational resources on the web into a structure akin to a textbook or a course, allowing the learner to navigate a sequence of webpages that take them from point A (their prior knowledge) to point B (material they want to learn)”.
Figure 1. In the excerpt which explains Expectation Maximization, solid-underlined terms are explained, and dash-underlined terms are assumed concepts. 
They generally take two steps to approach this task. (i) Firstly, they make an assumption that a document is a bag-of-terms which can be classified into two distinct categories – explained and assumed. Assumed terms are prerequisite knowledge assumed to be familiar, so that readers can go further to understand the document. Explained terms are target concepts people aim to understand. Figure 1 provides a detailed illustration about the two categories. They train a logistic regression model to implement binary classification of terms relying on two books - The Rice University’s Online Statistics Education: An Interactive Multimedia Course of Study textbook (referred to as STATBOOK) and Bishop’s Pattern Recognition and Machine Learning textbook (referred to as PRML). Such classification model can employ a series of features, including the context of each term, position of each term and the frequency rank of each term, etc. (ii) In the second step, Labutov and Lipson aim to find an optimal path from a user’s current positon – considered as a set of prior known terms, to the destination – the target document containing a set of unfamiliar concepts. Following the path, a user must acquire all necessary prerequisite knowledge before entering into the next document(s), and finally arrives at the destination. Formally speaking, they try to find “a self-contained sequence of documents of minimal length that cover all of the concepts needed to understand the target document” (shown in Figure 2).
Figure 2. An optimal path or document sequence is shown here. (left) For each document (x-axis), the set of concepts are marked out (y-axis): blue indicates explained terms and red means assumed terms. (right) The directed graph is the optimal path. Nodes correspond to documents and directed edges show the prerequisite relationships.
 This talk only covered their preliminary work, their next goal is to find ways to customize searching results by considering a user’s personal features, such as prior knowledge, requirements and time etc. Their long-term goal is to develop the next generation lifelong learning tools that leverage all of the web. As indicated in the paper, “we hope that this work, in addition to the datasets that we release, will serve to inspire interest from the community in what we believe is a challenging and an important task.”

Details about talks:

Talk URL:
Title: Web as a textbook: Curating Targeted Learning Paths through the Heterogeneous Learning Resources on the Web.
Speaker: Igor Labutov
Paper URL:

No comments:

Post a Comment