Notes on Clickstream Clustering Paper
The paper proposes a technique to cluster webusers based on longest common subsequence of their clickstreams taking into account both the walk through the website as well as the time spent on each page. The motivation is to find groups of users based on similar interests or motivations behing vising the website. This is assuming that similarity in clickstream indicates similarity in interests or motivations behing vising the website.
The similarity measure between two walks is based on longest common subsequence (LCS). This is quite well studied and through dynamic programming it is possible to compute LCS in time where
and
are the lengths of input sequences. The key contribution of this work is to exted this approach where in there is a real number (time spent on each page) assocated with each vertex in the walk. Here is a snippet from the paper.
I am not quite sure why geometric mean is used, other than that the approach is solid. The authors use a graph-based clustering approach and perform experiments with data from sulekha.com.
Categories: Uncategorized
