Home > Uncategorized > Kernel/Distance Measure for Walks in a Graph for Classifying and Clustering Player Data

Kernel/Distance Measure for Walks in a Graph for Classifying and Clustering Player Data

September 23rd, 2009 ketkar Leave a comment Go to comments

Given that we have a set of walks of the form W = \{(l_1, t_1), (l_2, t_2)…\} , where l is a vertex and t is the time spent of that vertex. Now we need a distance measure or kernel or similarity score between two walks W_i and W_j so that we can cluster or classify them with. Here is an idea to do the same. The overall idea is based on Longest Common Subsequence (LCS) which basically doing a diff on two walks. To give an example, the longest common subsequence of AABBAAD and ABBDDDA is ABBD. Basically deletions are allowed. The problem with directly applying this idea is that it does not allow us to consider the time spent on each vertex. Obviously we could duplicate vertices so as to indicate the time  spent, so that  \{(A_1, 3), (B, 2)…\} becomes AAABB. This is a bad idea. The complexity of LCS is O(mn) where m and n are lengths of input sequences. So we are going to get clobbered pretty bad if users spend too much time in some locations. Here is a trick to do this quickly.

First compute the LCS, ignoring the time spent on each vertex. Note that for walks for W_A and W_B we will refer to the sequence of locations as L_A and L_B and timings as T_A and T_B. Suppose the output of LCS for walks L_A and L_B is in the form to two functions P_A and P_B such that L_A[P_A(i)] = L_B[P_B(i)], for i going from 0 to n which is the length of the LCS of L_A and L_B. Now a measure of the time spent in common on a particular vertex with index i can be computed as \begin{equation} \frac{min(T_A[P_A(i)],T_B[P_B(i)])}{min(T_A[P_A(i)],T_B[P_B(i)])} \end{equation}

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.