The kd-tree is a well-known data structure, which alongside quadtrees and octrees, is commonly used for the task of space partitioning. That is, it aims to represent a set of points in k-dimensional space in a way that is efficient to access. Kd-trees, as do all binary trees, have the notable advantage of an average search time, far superior to the naive
search.
While such trees perhaps have their best known application in computer graphics, I discovered them here in relation to a computational physics project on which I am currently working. For those who are curious about the context, I have recently been doing some statistical analysis on the distribution stars using the Hipparcos catalogue. Unfortunately due to sheer number of stars (100,000+), performing a nearest neighbour search or range search on each star in 3D (Euclidean) space is simply infeasible without the use of a clever algorithm. This is how I discovered the wonder of kd-trees, reducing a task that would otherwise take days to several minutes.
The software I am writing for this project happens to be in C# 4.0, and upon a bit of investigation, I discovered that there exists no (decent) implementation of the kd-tree structure and its algorithms for .NET. Thus, with a bit of help from the Wikipedia article and a paper I dug up, I decided to have a go at a generic implementation. Now that it is complete and tested, I can gladly declare that the job was not half as easy as I expected (but somehow quite rewarding)!
The following features have been implemented (and fully documented with XML comments).
- Tree construction
- Dynamic node addition
- Dynamic node removal
- Nearest neighbour search
- Range search
I’ve now uploaded the complete code for the implementation, which is available below as several C# code files. Note that you will also need the (wonderful) Math.NET Numerics library to compile things – it provides the generic vector type and a few other handy things. (You can probably adapt the code easily enough if you don’t fancy having this dependency.)
Downloads
(All code is licensed under the MIT license. Copyright 2011 Alex Regueiro.)
Tests (MSTest framework)
The idea is that the KdTree class is available to use as a black box, so let me know if anything is unclear. (If you’re wondering though, the arithmetic stuff is there to provide generic arithmetic support complementing the generic vector support. You’ll probably just want to use double most of the time.) Finally, if you’re interested in the unit tests, you can grab them over at the project repository. Enjoy.
Hey – Noldorin – this looks great – I have been using the Numerical Recipes impl but calling it back and forth from a C# harness which is SLOWWW. Two things are not clear to me however. Am I able to retreive items which are within a radius of r of say point P ? (Is that the public IEnumerable FindInRange(Vector location, TField range) method ) Also, TField and TValue ?? I have a list of Point structures which I want to put into the tree. What is the signature for my KDtree initialisation ? Many thanks for this (my project is a toy Lego robot navigation project ) Many thanks again
Hi -
Many thanks.
I posted earlier – please ignore – having looked at the unit tests all is clear – this is a wonderful piece of code !!
I was going to port NRC, but this has saved a whole pile of aggravation.
Great job.
Best
Dave
Glad to hear that. And yeah, hopefully the unit tests act as the documentation! I use Math.NET for the vector-based stuff in code, but it should be easy enough to substitute that dependency.
Good luck with your project; sounds fun!
It looks great, it’s really sad that it depends on Math.Net Numerics though. I’m not quite sure whether it would be easiest to modify this to avoid this dependency or write it from scratch
Yeah, the Math.NET Numerics dependency is purely because I originally wrote this code for a computational physics project, and we tied into that dependency anyway. You can abstract it easily enough by writing your own simple Vector classes, I think. Give it a go and let me know if you run into any specific problems.
I ended up writing the whole data structure myself. Your version is built to be very flexible (working on any range of dimension and data types) and could for such be used in a data structure framework, whereas I built mine very specific for 2D using the “vector” class that was present in my project.
Thanks for the inspiration though
Well, I’m glad it helped at least. The algorithmic implementations are there, for sure. But I agree, it is meant to be a pretty generic class, so I can understand a rewrite for your specific needs…
Hey Qua – any chance of you sharing that code? I need a 2D implementation for a mapping project. If you can contribute it would be deeply appreciated. Thanks. yakadum @ yahoo . com
No, you *don’t* need a 2D implementation. This is a generic n-d implementation already. If you simply want to remove the Math.NET Numerics matter, then that’s another matter – a trivial one in fact.
Noldorin, thanks for the excellent implementation. I would like to implement a specialised version but I was wondering what is the license required if any to use this code. For instance GNU GPL ?
Hi Hass. I’m glad you find it useful. You’re right, the code really needs a license (especially now that it’s garnering a lot of third-party interest.) I’ve updated the post to indicate that everything’s licensed under the MIT license – you should find this fairly unrestrictive.
Thanks Noldorin, that’s great
Hi!
Would it be possible to put a sample code in C# illustrating the use of the tree? Just a few lines if possible, please. I should also say that it is an excellent implementation, the quality of which is exceeding the many – if not all of – code snippets available on the internet in terms of C# examples.
You make a fair point. I just realised when I read your comment that the post lacked examples. For the sake of simplicity, I’ve simply made the KdTree unit tests (for the project I was working on) available for download. Hopefully this (with the comments) should be sufficient to get you going, but let me know if you need any further clarification.
Hi Noldorin,
I thank you very much for your quick response. I am sorry I forgot in my previous post to mention that I had already checked the unit test. I should say that I did not get much information partly because the Math.Net Numerics is also completely new to me. However, I figured out how it “might” work. Yet, I do not have the full confidence to my interpretation of your data structure.
If it does not much work, I kindly ask a very simple example (just a few lines) targeting how you initialize the structure and how/what you get at the end.
Regards
Are you sure that your Remove method works correctly? My implementation took origin in yours, but now that I starting to use Remove it seems strange to me. As far as I can see you only compare a single dimension and then if this single dimension value is identical you proclaim the two elements equal and remove them. There is no guarantee that the elements are equal, however, and this introduce a bug.
It might just be my implementation, but it looks like yours have the same flaw.
Hi Qua.
You could well be right. The unit tests for Remove are not extensive. Indeed, I did not actually use the Remove functionality actively in my scientific project.
However, I’m tempted to believe that the fix required is small. Perhaps you could provided a test case (just an extra few lines in the unit tests file) to demonstrate the problem? Would be happy to fix it then.
Cheers, Alex
Try adding the nodes (0,0) followed by (0,1) and then remove (0,1) and see if the tree is what you expected.
Hi Noldorin, thank you very much for your work on this. I’m attempting to leverage you code and ran into a problem compiling the KdTreeTests.cs. In the Find TestFindNearestNNeighbours method it looks like the test is calling a method that doesn’t appear to be implimented in the KdTree class that would allow you to find the nearest N Neighbours instead of only nearest Neighbour. Is there a version of your work that includes an overloaded FindNearestNeighbours method that will search for multiple Neighbours?
Hi Ian.
You’re absolutely right. I added the CS file for the unit tests at a later date, having updated the core KdTree implementation in the meanwhile (for my FermiSim project, linked above). I’ve now updated the KdTree.cs and KdTreeNode.cs files to reflect the latest revision of my project (now inactive). It provides the requisite n-nearest-neighbours implementation.
Hope that helps.
Hey Noldorin, looking at FindNearestNNeighbors I see that you only insert values into the list when they are within the hypersphere of smaller distance. However, this only makes sense when the size of the list is numNeighbors; before that, you should probably add to the list every value you come across. Otherwise, you might have the closest value your location at the root, which would result in a small hypersphere and no more elements would be picked. Am I missing something?
Also, why would you remove from the output a node just because it is the search location? As any other value in the tree, I would expect it goes into the output. Is there any restriction I’m not aware of?
In any case, thanks for the code. It’s very useful!
Hi Cesar. Sorry for the delayed response. If you’re still interested; yes, you are right that the check to see whether a node is within the hypersphere should only be made if the list is already “full”. I threw together this method quite quickly, and since I don’t believe I ever used it in my project, it was never thoroughly tested. (It passed the basic unit tests.) Thanks for pointing it out though.
I exclude nodes at the identical location to the search location for semantic reasons. In the application of my project, the input location was that of a star (arbitrary node), and we never wanted to return it back. i.e. It’s not technically one of the “nearest neighbours” of a star, but rather the star itself. This should be a trivial change, however, if your application demands it.
Glad the implementation was of help. Also, I’d certainly appreciate a patch/fix for the first issue (FindNearestNNeighbors) if you have one.
Hello Noldorin,
thanks for that code. I actually try to export a little program to .NET which I wrote in Matlab a year ago. I used a KD-tree to speed up the determination of closest triangles. I was very glad when I found your code but now I stumbled over the fact that the indices from the unsorted raw data array get lost after the construction of the tree. Is that correct or did I miss something?
Thanks again for your work
maaschn
Hi maaschn. That’s quite how it’s meant to work. Are these indices ones of an array you input? If so, they are implicit and hence not meant to be stored. Try making the actual values of the array/collection structs/tuples that contain the relevant indices.
There is indeed a bug in Remove(). I’ve attached a fixed version – which I havent tested exhaustively but should work better
BROKEN CODE:
if (node.LeftChild == null && node.RightChild == null)
return null;
if (node.RightChild != null)
{
node.Value = FindMinimum(node.RightChild, dimension, depth + 1);
}
else
{
node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);
node.LeftChild = null;
}
node.RightChild = Remove(node.Value, node.RightChild, depth + 1);
FIXED CODE:
if (node.RightChild != null)
{
node.Value = FindMinimum(node.RightChild, dimension, depth + 1);
node.RightChild = Remove(node.Value, node.RightChild, depth + 1);
}
else if (node.LeftChild != null)
{
node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);
node.RightChild = Remove(node.Value, node.LeftChild, depth + 1);
node.LeftChild = null;
}
else
{
node = null;
}
Thanks for the suggestion. Would you mind briefly summarising what you observed as the problem with the original code?
The original code finds the minimum value in node.LeftChild and copies it up to the pivot node. It then proceeds to remove it from node.RightChild (which is always null) instead of node.LeftChild. So you end up with duplicate nodes in the tree.
There is actually another bug in FindMinumum that causes Remove to corrupt the tree. I dont know how to translate the fix to your code – I’ll leave that up to you.
You need to make sure you return the SMALLEST of leftMinValue and rightMinValue. So you need to compare them to each other. Right now, you favor leftMinValue.
This can corrupt the tree because you may copy a node up to the pivot that does not respect the splittingDimension (return node.Value).
Okay, so re: the first suggestion I think you’re right actually. You changed my code more than necessary, but I get the idea.
Maybe you could post a reproduction case for the second possible bug? One for the first would be appreciated either; for unit tests, but I should be able to do that anyway.
Bernie,
Regarding both bugs, I think I’ve managed to fix them now. Your suggested change for the first issue seems to make sense, so I’ve included that verbatim. The second wasn’t too tricky, but you can have a look at that too.
Cheers, A.
Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and my FindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.
Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree, and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. It does not guarantee correctness, but it will quickly catch big issues.
My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equality rules – so that was easy to implement for me.
Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and my FindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.
Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree, and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. It does not guarantee correctness, but it will quickly catch big issues.
My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equality rules – so that was easy to implement for me.
Very nice library! I’m gonna give the author a great big thank you. This is really great.