<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nikhil Ketkar</title>
	<atom:link href="http://gameintelligencegroup.org/ketkar/feed/" rel="self" type="application/rss+xml" />
	<link>http://gameintelligencegroup.org/ketkar</link>
	<description>Just another Game Intelligence Group weblog</description>
	<lastBuildDate>Sat, 03 Oct 2009 01:51:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Partitioning with KMeans</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/10/02/partitioning-with-kmeans/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/10/02/partitioning-with-kmeans/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 01:41:07 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/2009/10/02/partitioning-with-kmeans/</guid>
		<description><![CDATA[Earlier I was using KMeans to get discrete locations from continuous 3D points. Dropped the idea as this is going to bias the results. Basically, if I have used KMeans instead of a uniform grid and then I do a test on an unseen sample, the sample is not really unseen as the entire dataset [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />Earlier I was using KMeans to get discrete locations from continuous 3D points. Dropped the idea as this is going to bias the results. Basically, if I have used KMeans instead of a uniform grid and then I do a test on an unseen sample, the sample is not really unseen as the entire dataset was used to do the discretization.</p>
<p>Here are the points in each location.<br />
<a href="http://nikhilketkar.files.wordpress.com/2009/10/kmeans-points-hist.png"><img src="http://nikhilketkar.files.wordpress.com/2009/10/kmeans-points-hist.png?w=300" alt="kmeans points hist" width="300" height="221" class="alignnone size-medium wp-image-166" /></a></p>
<p>Here are the walks going through each location.</p>
<p><a href="http://nikhilketkar.files.wordpress.com/2009/10/kmean-hist.png"><img src="http://nikhilketkar.files.wordpress.com/2009/10/kmean-hist.png?w=300" alt="kmean hist" width="300" height="214" class="alignnone size-medium wp-image-165" /></a></p>
<p>Note that in both cased K was set to 1000.</p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/10/02/partitioning-with-kmeans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UCT Transfer Learning dataset 3D Plot</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/10/02/uct-transfer-learning-dataset-3d-plot/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/10/02/uct-transfer-learning-dataset-3d-plot/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 01:27:59 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/2009/10/02/uct-transfer-learning-dataset-3d-plot/</guid>
		<description><![CDATA[Here are some points from the UCT Transfer Learning dataset in 3D. Its funny that we can almost see the map.

]]></description>
			<content:encoded><![CDATA[<p id="top" />Here are some points from the UCT Transfer Learning dataset in 3D. Its funny that we can almost see the map.</p>
<p><a href="http://nikhilketkar.files.wordpress.com/2009/10/allpoints.png"><img src="http://nikhilketkar.files.wordpress.com/2009/10/allpoints.png" alt="allpoints" width="625" height="429" class="alignnone size-full wp-image-163" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/10/02/uct-transfer-learning-dataset-3d-plot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Frequencies of points and walks in discrete location</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/10/02/frequencies-of-points-and-walks-in-discrete-location/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/10/02/frequencies-of-points-and-walks-in-discrete-location/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 01:10:41 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/2009/10/02/frequencies-of-points-and-walks-in-discrete-location/</guid>
		<description><![CDATA[I have been superimposing a grid to get discrete locations from continuous 3D points. The following diagram should give a general idea.

Now the question is how main points lie in each discrete location, and perhaps more importantly, how many walks pass through each location?
Here is a plot for the number of points in each location.

Here [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />I have been superimposing a grid to get discrete locations from continuous 3D points. The following diagram should give a general idea.</p>
<p><a href="http://nikhilketkar.files.wordpress.com/2009/10/figure41.png"><img class="alignnone size-medium wp-image-155" src="http://nikhilketkar.files.wordpress.com/2009/10/figure41.png?w=300" alt="figure41" width="300" height="111" /></a></p>
<p>Now the question is how main points lie in each discrete location, and perhaps more importantly, how many walks pass through each location?<br />
Here is a plot for the number of points in each location.<br />
<a href="http://nikhilketkar.files.wordpress.com/2009/10/points_frequency1.png"><img class="alignnone size-medium wp-image-154" src="http://nikhilketkar.files.wordpress.com/2009/10/points_frequency1.png?w=300" alt="points frequency1" width="300" height="225" /></a></p>
<p>Here is the plot for number of walks in each location.<br />
<a href="http://nikhilketkar.files.wordpress.com/2009/10/walks_frequency1.png"><img class="alignnone size-medium wp-image-153" src="http://nikhilketkar.files.wordpress.com/2009/10/walks_frequency1.png?w=300" alt="walks frequency1" width="300" height="225" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/10/02/frequencies-of-points-and-walks-in-discrete-location/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Research Programming: Remember the Following</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/23/research-programming-remember-the-following/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/23/research-programming-remember-the-following/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 01:42:00 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/?p=120</guid>
		<description><![CDATA[

For C++ code,  read and write data in the simplest of formats. If complicated formats are necessary write python scripts. You should be able  to make do with ifstream and ofstream. If you need Boost tokenizer, Sprit or LibXML, you are opening qa can of worms, what you need a python script and a simple [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />
<ol>
<li>For C++ code,  read and write data in the simplest of formats. If complicated formats are necessary write python scripts. You should be able  to make do with ifstream and ofstream. If you need Boost tokenizer, Sprit or LibXML, you are opening qa can of worms, what you need a python script and a simple intermediate format.</li>
<li> Extract a small subset of real data for testing and debugging. Seriously, don&#8217;t do to work with the entire dataset.</li>
<li>Don&#8217;t trust libraries with core stuff. Write all code from scratch. You should know exactly what is going on in your code.</li>
<li>Save raw output data, all of it.</li>
<li>Separate plotting from the execution of the algorithm. You should not have to rerun the experiment to get a different plot.</li>
<li>Every once in a while extract and put away frequently used code in a library.</li>
<li>Do not use a unit testing framework, it&#8217;s an overkill. Write small tests as you go along in the main, when successful delete and move on.</li>
<li>Try to exactly replicate the algorithm as possible. No shortcuts.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/23/research-programming-remember-the-following/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Problems with results</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/23/problems-with-results/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/23/problems-with-results/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 01:22:58 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/?p=68</guid>
		<description><![CDATA[There was some problem with Greedy approach implementation. As seen from the previous results, for very low training sets, the greedy approach is performing worse than others. The problem was that after getting 100% coverage on the training set, the greedy implementation was not properly picking vertices at random. I think I fixed that and here are [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />There was some problem with Greedy approach implementation. As seen from the previous results, for very low training sets, the greedy approach is performing worse than others. The problem was that after getting 100% coverage on the training set, the greedy implementation was not properly picking vertices at random. I think I fixed that and here are the update results.</p>
<div id="attachment_69" class="wp-caption aligncenter" style="width: 650px"><a rel="attachment wp-att-69" href="http://nikhilketkar.wordpress.com/2009/09/23/problems-with-results/100-0-01-3/"><img class="size-full wp-image-69" src="http://nikhilketkar.files.wordpress.com/2009/09/100-0-011.png" alt="1% Training on 100 partitions between -3000 and 3000" width="640" height="480" /></a><p class="wp-caption-text">1% Training on 100 partitions between -3000 and 3000</p></div>
<p>For really small training sizes here is what is observed. Note that results are no better that random for all three approaches.</p>
<div id="attachment_117" class="wp-caption aligncenter" style="width: 650px"><a rel="attachment wp-att-117" href="http://nikhilketkar.wordpress.com/2009/09/23/problems-with-results/25-0-0025/"><img class="size-full wp-image-117" src="http://nikhilketkar.files.wordpress.com/2009/09/25-0-0025.png" alt="25 partitions between -3000 and 3000, 0.25% used for training" width="640" height="480" /></a><p class="wp-caption-text">25 partitions between -3000 and 3000, 0.25% used for training</p></div>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/23/problems-with-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kernel/Distance Measure for Walks in a Graph for Classifying and Clustering Player Data</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data-2/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data-2/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 22:29:45 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/?p=72</guid>
		<description><![CDATA[Given that we have a set of walks of the form , where  is a vertex and  is the time spent of that vertex. Now we need a distance measure or kernel or similarity score between two walks  and  so that we can cluster or classify them with. Here is an [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />Given that we have a set of walks of the form <img src='http://s.wordpress.com/latex.php?latex=W%20%3D%20%5C%7B%28l_1%2C%20t_1%29%2C%20%28l_2%2C%20t_2%29%26%238230%3B%5C%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W = \{(l_1, t_1), (l_2, t_2)&#8230;\} ' title='W = \{(l_1, t_1), (l_2, t_2)&#8230;\} ' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=l&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='l' title='l' class='latex' /> is a vertex and <img src='http://s.wordpress.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t' title='t' class='latex' /> is the time spent of that vertex. Now we need a distance measure or kernel or similarity score between two walks <img src='http://s.wordpress.com/latex.php?latex=W_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_i' title='W_i' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=W_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_j' title='W_j' class='latex' /> so that we can cluster or classify them with. Here is an idea based on <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.427">this paper</a> to do the same. The overall idea is based on <a href="http://en.wikipedia.org/wiki/Longest_common_subsequence_problem">Longest Common Subsequence</a> (LCS) which basically doing a diff on two walks. To give an example, the longest common subsequence of AABBAAD and ABBDDDA is ABBD. Basically deletions are allowed. The problem with directly applying this idea is that it does not allow us to consider the time spent on each vertex. Obviously we could duplicate vertices so as to indicate the time  spent, so that  <img src='http://s.wordpress.com/latex.php?latex=%5C%7B%28A_1%2C%203%29%2C%20%28B%2C%202%29%26%238230%3B%5C%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\{(A_1, 3), (B, 2)&#8230;\} ' title='\{(A_1, 3), (B, 2)&#8230;\} ' class='latex' /> becomes AAABB. This is a bad idea. The complexity of LCS is <img src='http://s.wordpress.com/latex.php?latex=O%28mn%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='O(mn)' title='O(mn)' class='latex' /> where <img src='http://s.wordpress.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='m' title='m' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> are lengths of input sequences. So we are going to get clobbered pretty bad if users spend too much time in some locations. Here is a trick to do this quickly.</p>
<p>First compute the LCS, ignoring the time spent on each vertex. Note that for walks for <img src='http://s.wordpress.com/latex.php?latex=W_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_A' title='W_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=W_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_B' title='W_B' class='latex' /> we will refer to the sequence of locations as <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' /> and timings as <img src='http://s.wordpress.com/latex.php?latex=T_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='T_A' title='T_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=T_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='T_B' title='T_B' class='latex' />. Suppose the output of LCS for walks <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' /> is in the form to two functions <img src='http://s.wordpress.com/latex.php?latex=P_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_A' title='P_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=P_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_B' title='P_B' class='latex' /> such that <img src='http://s.wordpress.com/latex.php?latex=L_A%5BP_A%28i%29%5D%20%3D%20L_B%5BP_B%28i%29%5D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A[P_A(i)] = L_B[P_B(i)]' title='L_A[P_A(i)] = L_B[P_B(i)]' class='latex' />, for <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> going from <img src='http://s.wordpress.com/latex.php?latex=0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0' title='0' class='latex' /> to <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> which is the length of the LCS of <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' />. Now a measure of the time spent in common on a particular vertex with index <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> can be computed as follows.</p>
<p style="text-align:center"><img src='http://s.wordpress.com/latex.php?latex=%5Cfrac%7Bmin%28T_A%5BP_A%28i%29%5D%2C%20T_B%5BP_B%28i%29%5D%29%7D%7Bmax%28T_A%5BP_A%28i%29%5D%2C%20T_B%5BP_B%28i%29%5D%29%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\frac{min(T_A[P_A(i)], T_B[P_B(i)])}{max(T_A[P_A(i)], T_B[P_B(i)])} ' title='\frac{min(T_A[P_A(i)], T_B[P_B(i)])}{max(T_A[P_A(i)], T_B[P_B(i)])} ' class='latex' /></p>
<p style="text-align:left">Then we compute a mean of such time spent on the same vertex in common for all indices.</p>
<p style="text-align:center"><img src='http://s.wordpress.com/latex.php?latex=C%20%3D%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi%20%3D%200%7D%5En%5Cfrac%7Bmin%28T_A%5BP_A%28i%29%5D%2C%20T_B%5BP_B%28i%29%5D%29%7D%7Bmax%28T_A%5BP_A%28i%29%5D%2C%20T_B%5BP_B%28i%29%5D%29%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C = \frac{1}{n} \sum_{i = 0}^n\frac{min(T_A[P_A(i)], T_B[P_B(i)])}{max(T_A[P_A(i)], T_B[P_B(i)])} ' title='C = \frac{1}{n} \sum_{i = 0}^n\frac{min(T_A[P_A(i)], T_B[P_B(i)])}{max(T_A[P_A(i)], T_B[P_B(i)])} ' class='latex' /></p>
<p style="text-align:left">This does not take into consideration what fraction of the walk is common. So even if a very small part is exactly common we will get a complete match score of 1. To address this we multiply <img src='http://s.wordpress.com/latex.php?latex=C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C' title='C' class='latex' /> by the following.</p>
<p style="text-align:center"><img src='http://s.wordpress.com/latex.php?latex=F%20%3D%20%28%5Cfrac%7B%5Csum%5En_%7Bi%3D0%7D%7BT_A%5BP_A%28i%29%5D%7D%7D%7B%5Csum%5E%7B%7CA%7C%7D_%7Bi%3D0%7D%7BT_A%5Bi%5D%7D%7D%20%5Ctimes%20%5Cfrac%7B%5Csum%5En_%7Bi%3D0%7D%7BT_B%5BP_B%28i%29%5D%7D%7D%7B%5Csum%5E%7B%7CB%7C%7D_%7Bi%3D0%7D%7BT_B%5Bi%5D%7D%7D%29%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='F = (\frac{\sum^n_{i=0}{T_A[P_A(i)]}}{\sum^{|A|}_{i=0}{T_A[i]}} \times \frac{\sum^n_{i=0}{T_B[P_B(i)]}}{\sum^{|B|}_{i=0}{T_B[i]}})^{\frac{1}{2}}' title='F = (\frac{\sum^n_{i=0}{T_A[P_A(i)]}}{\sum^{|A|}_{i=0}{T_A[i]}} \times \frac{\sum^n_{i=0}{T_B[P_B(i)]}}{\sum^{|B|}_{i=0}{T_B[i]}})^{\frac{1}{2}}' class='latex' /></p>
<p style="text-align:left">Clearly,</p>
<ol>
<li><img src='http://s.wordpress.com/latex.php?latex=F%20%5Ctimes%20C%20%3D%200&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='F \times C = 0' title='F \times C = 0' class='latex' /> if there are no locations in common.</li>
<li><img src='http://s.wordpress.com/latex.php?latex=F%20%5Ctimes%20C%20%3D%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='F \times C = 1' title='F \times C = 1' class='latex' /> if walks are exactly identical.</li>
<li>In general <img src='http://s.wordpress.com/latex.php?latex=F%20%5Ctimes%20C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='F \times C' title='F \times C' class='latex' /> is between <img src='http://s.wordpress.com/latex.php?latex=0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0' title='0' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1' title='1' class='latex' />.</li>
</ol>
<p style="text-align:left">
<p style="text-align:center">
<p style="text-align:left">
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kernel/Distance Measure for Walks in a Graph for Classifying and Clustering Player Data</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 21:57:08 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gameintelligencegroup.org/ketkar/?p=3</guid>
		<description><![CDATA[Given that we have a set of walks of the form , where  is a vertex and  is the time spent of that vertex. Now we need a distance measure or kernel or similarity score between two walks  and  so that we can cluster or classify them with. Here is an [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />Given that we have a set of walks of the form <img src='http://s.wordpress.com/latex.php?latex=W%20%3D%20%5C%7B%28l_1%2C%20t_1%29%2C%20%28l_2%2C%20t_2%29%26%238230%3B%5C%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W = \{(l_1, t_1), (l_2, t_2)&#8230;\} ' title='W = \{(l_1, t_1), (l_2, t_2)&#8230;\} ' class='latex' />, where <img src='http://s.wordpress.com/latex.php?latex=l&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='l' title='l' class='latex' /> is a vertex and <img src='http://s.wordpress.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t' title='t' class='latex' /> is the time spent of that vertex. Now we need a distance measure or kernel or similarity score between two walks <img src='http://s.wordpress.com/latex.php?latex=W_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_i' title='W_i' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=W_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_j' title='W_j' class='latex' /> so that we can cluster or classify them with. Here is an idea to do the same. The overall idea is based on <a href="http://en.wikipedia.org/wiki/Longest_common_subsequence_problem">Longest Common Subsequence</a> (LCS) which basically doing a diff on two walks. To give an example, the longest common subsequence of AABBAAD and ABBDDDA is ABBD. Basically deletions are allowed. The problem with directly applying this idea is that it does not allow us to consider the time spent on each vertex. Obviously we could duplicate vertices so as to indicate the time  spent, so that  <img src='http://s.wordpress.com/latex.php?latex=%5C%7B%28A_1%2C%203%29%2C%20%28B%2C%202%29%26%238230%3B%5C%7D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\{(A_1, 3), (B, 2)&#8230;\} ' title='\{(A_1, 3), (B, 2)&#8230;\} ' class='latex' /> becomes AAABB. This is a bad idea. The complexity of LCS is <img src='http://s.wordpress.com/latex.php?latex=O%28mn%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='O(mn)' title='O(mn)' class='latex' /> where <img src='http://s.wordpress.com/latex.php?latex=m&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='m' title='m' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> are lengths of input sequences. So we are going to get clobbered pretty bad if users spend too much time in some locations. Here is a trick to do this quickly.</p>
<p>First compute the LCS, ignoring the time spent on each vertex. Note that for walks for <img src='http://s.wordpress.com/latex.php?latex=W_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_A' title='W_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=W_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W_B' title='W_B' class='latex' /> we will refer to the sequence of locations as <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' /> and timings as <img src='http://s.wordpress.com/latex.php?latex=T_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='T_A' title='T_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=T_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='T_B' title='T_B' class='latex' />. Suppose the output of LCS for walks <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' /> is in the form to two functions <img src='http://s.wordpress.com/latex.php?latex=P_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_A' title='P_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=P_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='P_B' title='P_B' class='latex' /> such that <img src='http://s.wordpress.com/latex.php?latex=L_A%5BP_A%28i%29%5D%20%3D%20L_B%5BP_B%28i%29%5D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A[P_A(i)] = L_B[P_B(i)]' title='L_A[P_A(i)] = L_B[P_B(i)]' class='latex' />, for <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> going from <img src='http://s.wordpress.com/latex.php?latex=0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0' title='0' class='latex' /> to <img src='http://s.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' /> which is the length of the LCS of <img src='http://s.wordpress.com/latex.php?latex=L_A&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_A' title='L_A' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=L_B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L_B' title='L_B' class='latex' />. Now a measure of the time spent in common on a particular vertex with index <img src='http://s.wordpress.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i' title='i' class='latex' /> can be computed as <img src='http://s.wordpress.com/latex.php?latex=%5Cbegin%7Bequation%7D%20%5Cfrac%7Bmin%28T_A%5BP_A%28i%29%5D%2CT_B%5BP_B%28i%29%5D%29%7D%7Bmin%28T_A%5BP_A%28i%29%5D%2CT_B%5BP_B%28i%29%5D%29%7D%20%5Cend%7Bequation%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\begin{equation} \frac{min(T_A[P_A(i)],T_B[P_B(i)])}{min(T_A[P_A(i)],T_B[P_B(i)])} \end{equation}' title='\begin{equation} \frac{min(T_A[P_A(i)],T_B[P_B(i)])}{min(T_A[P_A(i)],T_B[P_B(i)])} \end{equation}' class='latex' /></p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/23/kerneldistance-measure-for-walks-in-a-graph-for-classifying-and-clustering-player-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hello world!</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/21/hello-world/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/21/hello-world/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 06:50:59 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Welcome to Game Intelligence Group. This is your first post. Edit or delete it, then start blogging!
]]></description>
			<content:encoded><![CDATA[<p id="top" />Welcome to <a href="http://gameintelligencegroup.org/">Game Intelligence Group</a>. This is your first post. Edit or delete it, then start blogging!</p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/21/hello-world/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Advertisement Placement Results</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/21/advertisement-placement-results/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/21/advertisement-placement-results/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 14:53:47 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/?p=61</guid>
		<description><![CDATA[Completed experiment comparing Greedy, Top-N and Markov placement. Seems like Greedy outperforms others except in the case of small training sets.
Here is a pdf, 100-0.01
Here is a pdf. 100-0.05
For larger training sets and smaller number of partitions the results are pretty much the same.
]]></description>
			<content:encoded><![CDATA[<p id="top" />Completed experiment comparing Greedy, Top-N and Markov placement. Seems like Greedy outperforms others except in the case of small training sets.</p>
<div id="attachment_64" class="wp-caption aligncenter" style="width: 650px"><a href="http://nikhilketkar.files.wordpress.com/2009/09/100-0-01.png"><img class="size-full wp-image-64" src="http://nikhilketkar.files.wordpress.com/2009/09/100-0-01.png" alt="1% Training, 100 partitions (-3000 to 3000)" width="640" height="480" /></a><p class="wp-caption-text">1% Training, 100 partitions (-3000 to 3000)</p></div>
<p>Here is a pdf, <a href="http://nikhilketkar.files.wordpress.com/2009/09/100-0-01.pdf">100-0.01</a></p>
<div id="attachment_65" class="wp-caption aligncenter" style="width: 650px"><a href="http://nikhilketkar.files.wordpress.com/2009/09/100-0-05.png"><img class="size-full wp-image-65" src="http://nikhilketkar.files.wordpress.com/2009/09/100-0-05.png" alt="5% Training, 100 partitions between (-3000, 3000)" width="640" height="480" /></a><p class="wp-caption-text">5% Training, 100 partitions between (-3000, 3000)</p></div>
<p>Here is a pdf. <a href="http://nikhilketkar.files.wordpress.com/2009/09/100-0-05.pdf">100-0.05</a></p>
<p>For larger training sets and smaller number of partitions the results are pretty much the same.</p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/21/advertisement-placement-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How are clustering algorithms evaluated?</title>
		<link>http://gameintelligencegroup.org/ketkar/2009/09/19/how-are-clustering-algorithms-evaluated/</link>
		<comments>http://gameintelligencegroup.org/ketkar/2009/09/19/how-are-clustering-algorithms-evaluated/#comments</comments>
		<pubDate>Sun, 20 Sep 2009 04:19:04 +0000</pubDate>
		<dc:creator>ketkar</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nikhilketkar.wordpress.com/?p=56</guid>
		<description><![CDATA[Here is a list, but need to explore further. I am not sure how this works if the distance measure is bad.

Dunn&#8217;s Index
Davies-Boldin&#8217;s Index
Partition Coefficient
Classification Entropy
Seperation Index
Fuzzy Hypervolume
CS Index
Calinsky-Harbnasz Index
I-Index

A good reference is A new index of cluster validity. Also look at Performance evaluation of some Clustering Algorithms and Validity Indices and A new cluster [...]]]></description>
			<content:encoded><![CDATA[<p id="top" />Here is a list, but need to explore further. I am not sure how this works if the distance measure is bad.</p>
<ol>
<li>Dunn&#8217;s Index</li>
<li>Davies-Boldin&#8217;s Index</li>
<li>Partition Coefficient</li>
<li>Classification Entropy</li>
<li>Seperation Index</li>
<li>Fuzzy Hypervolume</li>
<li>CS Index</li>
<li>Calinsky-Harbnasz Index</li>
<li>I-Index</li>
</ol>
<p>A good reference is <a href="http://nikhilketkar.files.wordpress.com/2009/09/volltext.pdf">A new index of cluster validity</a>. Also look at <a href="http://portal.acm.org/citation.cfm?id=628859">Performance evaluation of some Clustering Algorithms and Validity Indices</a> and<a href="http://portal.acm.org/citation.cfm?id=289724"> A new cluster validity index for the fuzzy c-mean</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gameintelligencegroup.org/ketkar/2009/09/19/how-are-clustering-algorithms-evaluated/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
