Home > Uncategorized > Notes on Clickstream Clustering Paper

Notes on Clickstream Clustering Paper

September 19th, 2009 ketkar Leave a comment Go to comments

The paper proposes a technique to cluster webusers based on longest common subsequence of their clickstreams taking into account both the walk through the website as well as the time spent on each page. The motivation is to find groups of users based on similar interests or motivations behing vising the website. This is assuming that similarity in clickstream indicates similarity in interests or motivations behing vising the website.

The similarity measure between two walks is based on longest common subsequence (LCS). This is quite well studied and through dynamic programming it is possible to compute LCS in time O(mn) where m and n are the lengths of input sequences. The key contribution of this work is to exted this approach where in there is a real number (time spent on each page) assocated with each vertex in the walk.  Here is a snippet from the paper.

Key idea behind extending LCS for clickstreams

Key idea behind extending LCS for clickstreams

I am not quite sure why geometric mean is used, other than that the approach is solid. The authors use a graph-based clustering approach and perform experiments with data from sulekha.com.

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.