KD-Trees for .NET

The kd-tree is a well-known data structure, which alongside quadtrees and octrees, is commonly used for the task of space partitioning. That is, it aims to represent a set of points in k-dimensional space in a way that is efficient to access. Kd-trees, as do all binary trees, have the notable advantage of an $O(\log n)$ average search time, far superior to the naive $O(n)$ search.

While such trees perhaps have their best known application in computer graphics, I discovered them here in relation to a computational physics project on which I am currently working. For those who are curious about the context, I have recently been doing some statistical analysis on the distribution stars using the Hipparcos catalogue. Unfortunately due to sheer number of stars (100,000+), performing a nearest neighbour search or range search on each star in 3D (Euclidean) space is simply infeasible without the use of a clever algorithm. This is how I discovered the wonder of kd-trees, reducing a task that would otherwise take days to several minutes.

The software I am writing for this project happens to be in C# 4.0, and upon a bit of investigation, I discovered that there exists no (decent) implementation of the kd-tree structure and its algorithms for .NET.  Thus, with a bit of help from the Wikipedia article and a paper I dug up, I decided to have a go at a generic implementation. Now that it is complete and tested, I can gladly declare that the job was not half as easy as I expected (but somehow quite rewarding)!

The following features have been implemented (and fully documented with XML comments).

• Tree construction
• Dynamic node addition
• Dynamic node removal
• Nearest neighbour search
• Range search

I’ve now uploaded the complete code for the implementation, which is available below as several C# code files. Note that you will also need the (wonderful) Math.NET Numerics library to compile things – it provides the generic vector type and a few other handy things. (You can probably adapt the code easily enough if you don’t fancy having  this dependency.)

KdTree.cs

KdTreeNode.cs

Arithmetic.cs

Tests (MSTest framework)

KdTreeTests.cs

The idea is that the KdTree class is available to use as a black box, so let me know if anything is unclear. (If you’re wondering though, the arithmetic stuff is there to provide generic arithmetic support complementing the generic vector support. You’ll probably just want to use double most of the time.) Finally, if you’re interested in the unit tests, you can grab them over at the project repository. Enjoy.

39 Responses to “KD-Trees for .NET”

1. Dave says:

Hey – Noldorin – this looks great – I have been using the Numerical Recipes impl but calling it back and forth from a C# harness which is SLOWWW. Two things are not clear to me however. Am I able to retreive items which are within a radius of r of say point P ? (Is that the public IEnumerable FindInRange(Vector location, TField range) method ) Also, TField and TValue ?? I have a list of Point structures which I want to put into the tree. What is the signature for my KDtree initialisation ? Many thanks for this (my project is a toy Lego robot navigation project ) Many thanks again

2. Dave says:

Hi -
I posted earlier – please ignore – having looked at the unit tests all is clear – this is a wonderful piece of code !! Many thanks.

I was going to port NRC, but this has saved a whole pile of aggravation.

Great job.

Best
Dave

3. Noldorin says:

Glad to hear that. And yeah, hopefully the unit tests act as the documentation! I use Math.NET for the vector-based stuff in code, but it should be easy enough to substitute that dependency.

Good luck with your project; sounds fun!

4. Qua says:

It looks great, it’s really sad that it depends on Math.Net Numerics though. I’m not quite sure whether it would be easiest to modify this to avoid this dependency or write it from scratch

5. Noldorin says:

Yeah, the Math.NET Numerics dependency is purely because I originally wrote this code for a computational physics project, and we tied into that dependency anyway. You can abstract it easily enough by writing your own simple Vector classes, I think. Give it a go and let me know if you run into any specific problems.

6. Qua says:

I ended up writing the whole data structure myself. Your version is built to be very flexible (working on any range of dimension and data types) and could for such be used in a data structure framework, whereas I built mine very specific for 2D using the “vector” class that was present in my project.

Thanks for the inspiration though

7. Noldorin says:

Well, I’m glad it helped at least. The algorithmic implementations are there, for sure. But I agree, it is meant to be a pretty generic class, so I can understand a rewrite for your specific needs…

8. Yak Fatzko says:

Hey Qua – any chance of you sharing that code? I need a 2D implementation for a mapping project. If you can contribute it would be deeply appreciated. Thanks. yakadum @ yahoo . com

9. Noldorin says:

No, you *don’t* need a 2D implementation. This is a generic n-d implementation already. If you simply want to remove the Math.NET Numerics matter, then that’s another matter – a trivial one in fact.

10. Hass says:

Noldorin, thanks for the excellent implementation. I would like to implement a specialised version but I was wondering what is the license required if any to use this code. For instance GNU GPL ?

11. Noldorin says:

Hi Hass. I’m glad you find it useful. You’re right, the code really needs a license (especially now that it’s garnering a lot of third-party interest.) I’ve updated the post to indicate that everything’s licensed under the MIT license – you should find this fairly unrestrictive.

12. Hass says:

Thanks Noldorin, that’s great

13. Curious says:

Hi!

Would it be possible to put a sample code in C# illustrating the use of the tree? Just a few lines if possible, please. I should also say that it is an excellent implementation, the quality of which is exceeding the many – if not all of – code snippets available on the internet in terms of C# examples.

14. Noldorin says:

You make a fair point. I just realised when I read your comment that the post lacked examples. For the sake of simplicity, I’ve simply made the KdTree unit tests (for the project I was working on) available for download. Hopefully this (with the comments) should be sufficient to get you going, but let me know if you need any further clarification.

15. Curious says:

Hi Noldorin,

I thank you very much for your quick response. I am sorry I forgot in my previous post to mention that I had already checked the unit test. I should say that I did not get much information partly because the Math.Net Numerics is also completely new to me. However, I figured out how it “might” work. Yet, I do not have the full confidence to my interpretation of your data structure.

If it does not much work, I kindly ask a very simple example (just a few lines) targeting how you initialize the structure and how/what you get at the end.

Regards

16. Qua says:

Are you sure that your Remove method works correctly? My implementation took origin in yours, but now that I starting to use Remove it seems strange to me. As far as I can see you only compare a single dimension and then if this single dimension value is identical you proclaim the two elements equal and remove them. There is no guarantee that the elements are equal, however, and this introduce a bug.

It might just be my implementation, but it looks like yours have the same flaw.

17. Noldorin says:

Hi Qua.

You could well be right. The unit tests for Remove are not extensive. Indeed, I did not actually use the Remove functionality actively in my scientific project.

However, I’m tempted to believe that the fix required is small. Perhaps you could provided a test case (just an extra few lines in the unit tests file) to demonstrate the problem? Would be happy to fix it then.

Cheers, Alex

18. Qua says:

Try adding the nodes (0,0) followed by (0,1) and then remove (0,1) and see if the tree is what you expected.

19. Ian says:

Hi Noldorin, thank you very much for your work on this. I’m attempting to leverage you code and ran into a problem compiling the KdTreeTests.cs. In the Find TestFindNearestNNeighbours method it looks like the test is calling a method that doesn’t appear to be implimented in the KdTree class that would allow you to find the nearest N Neighbours instead of only nearest Neighbour. Is there a version of your work that includes an overloaded FindNearestNeighbours method that will search for multiple Neighbours?

20. Noldorin says:

Hi Ian.

You’re absolutely right. I added the CS file for the unit tests at a later date, having updated the core KdTree implementation in the meanwhile (for my FermiSim project, linked above). I’ve now updated the KdTree.cs and KdTreeNode.cs files to reflect the latest revision of my project (now inactive). It provides the requisite n-nearest-neighbours implementation.

Hope that helps.

21. César says:

Hey Noldorin, looking at FindNearestNNeighbors I see that you only insert values into the list when they are within the hypersphere of smaller distance. However, this only makes sense when the size of the list is numNeighbors; before that, you should probably add to the list every value you come across. Otherwise, you might have the closest value your location at the root, which would result in a small hypersphere and no more elements would be picked. Am I missing something?

Also, why would you remove from the output a node just because it is the search location? As any other value in the tree, I would expect it goes into the output. Is there any restriction I’m not aware of?

In any case, thanks for the code. It’s very useful!

22. Noldorin says:

Hi Cesar. Sorry for the delayed response. If you’re still interested; yes, you are right that the check to see whether a node is within the hypersphere should only be made if the list is already “full”. I threw together this method quite quickly, and since I don’t believe I ever used it in my project, it was never thoroughly tested. (It passed the basic unit tests.) Thanks for pointing it out though.

I exclude nodes at the identical location to the search location for semantic reasons. In the application of my project, the input location was that of a star (arbitrary node), and we never wanted to return it back. i.e. It’s not technically one of the “nearest neighbours” of a star, but rather the star itself. This should be a trivial change, however, if your application demands it.

Glad the implementation was of help. Also, I’d certainly appreciate a patch/fix for the first issue (FindNearestNNeighbors) if you have one.

23. maaschn says:

Hello Noldorin,

thanks for that code. I actually try to export a little program to .NET which I wrote in Matlab a year ago. I used a KD-tree to speed up the determination of closest triangles. I was very glad when I found your code but now I stumbled over the fact that the indices from the unsorted raw data array get lost after the construction of the tree. Is that correct or did I miss something?

Thanks again for your work

maaschn

24. Noldorin says:

Hi maaschn. That’s quite how it’s meant to work. Are these indices ones of an array you input? If so, they are implicit and hence not meant to be stored. Try making the actual values of the array/collection structs/tuples that contain the relevant indices.

25. Bernie says:

There is indeed a bug in Remove(). I’ve attached a fixed version – which I havent tested exhaustively but should work better

BROKEN CODE:
if (node.LeftChild == null && node.RightChild == null)
return null;

if (node.RightChild != null)
{
node.Value = FindMinimum(node.RightChild, dimension, depth + 1);
}
else
{
node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);
node.LeftChild = null;
}

node.RightChild = Remove(node.Value, node.RightChild, depth + 1);

FIXED CODE:
if (node.RightChild != null)
{
node.Value = FindMinimum(node.RightChild, dimension, depth + 1);
node.RightChild = Remove(node.Value, node.RightChild, depth + 1);
}
else if (node.LeftChild != null)
{
node.Value = FindMinimum(node.LeftChild, dimension, depth + 1);
node.RightChild = Remove(node.Value, node.LeftChild, depth + 1);
node.LeftChild = null;
}
else
{
node = null;
}

26. Noldorin says:

Thanks for the suggestion. Would you mind briefly summarising what you observed as the problem with the original code?

27. Bernie says:

The original code finds the minimum value in node.LeftChild and copies it up to the pivot node. It then proceeds to remove it from node.RightChild (which is always null) instead of node.LeftChild. So you end up with duplicate nodes in the tree.

28. Bernie says:

There is actually another bug in FindMinumum that causes Remove to corrupt the tree. I dont know how to translate the fix to your code – I’ll leave that up to you.

You need to make sure you return the SMALLEST of leftMinValue and rightMinValue. So you need to compare them to each other. Right now, you favor leftMinValue.

This can corrupt the tree because you may copy a node up to the pivot that does not respect the splittingDimension (return node.Value).

29. Noldorin says:

Okay, so re: the first suggestion I think you’re right actually. You changed my code more than necessary, but I get the idea.

Maybe you could post a reproduction case for the second possible bug? One for the first would be appreciated either; for unit tests, but I should be able to do that anyway.

30. Noldorin says:

Bernie,

Regarding both bugs, I think I’ve managed to fix them now. Your suggested change for the first issue seems to make sense, so I’ve included that verbatim. The second wasn’t too tricky, but you can have a look at that too.

Cheers, A.

31. Bernie says:

Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and my FindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.

Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree, and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. It does not guarantee correctness, but it will quickly catch big issues.

My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equality rules – so that was easy to implement for me.

32. Bernie says:

Hi Noldorin, aplologies for being slow to reply – I’ve been busy with work recently. I have a different KdTree implementation and my FindMinumum looks quite different to yours so its a bit hard for me to evaluate your implementation.

Its not very scientific – but I’d suggest writing a unit test that creates a list of items with random locations, add those items to your KdTree, and then remove one item at a time from both the list and the tree. Then you can ensure that removal works using CollectionAssert. It does not guarantee correctness, but it will quickly catch big issues.

My implementation is also hardened to support two items with the same location (and removal of a specific one). But I have simple equality rules – so that was easy to implement for me.

33. Eric Rini says:

Very nice library! I’m gonna give the author a great big thank you. This is really great.

34. Ravi says:

Hi,

Thank you for the awesome library. Can you please share the implementation which doesn’t depend on Math.Net? Actually, I am using it for a Windows Phone application and it doesn’t seem to support Math.Net.

Thanks.

35. Sara says:

Hi

I am not using your codes yet, but I do have a question. Can I use some mathematical operation between each nodes?

36. […] found this C# K-d Tree implementation with nearest neighbor search which I used for testing the […]

37. Normandie says:

Hi,

Does our kd-tree – k-Nearest Neighbors algorithm can be used with any kind of data? For example I want to use it to compare Strings.

Thanks for your help

38. Noldorin says:

Sure, you just need a vector structure over the strings… that’s up to you how you design that though.

39. Normandie says:

I don’t see how to do that. Would you have an example?

Here is I do that with the Accord.NET Framework.
``` string[] inputs = { "un", "une", "bijou", "joyau", "trésor", "merveille", "féerie", "perle", "réel", "authentique", "véritable", "encore", "du monde", "de l'ère", "de l'époque", "chef d'oeuvre", "mont de piété", "temple" }; int[] outputs = { 0,0, 1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2 };```

``` ```

``` int iNumberOfClasses = outputs.Distinct().Count(); KNearestNeighbors knn = new KNearestNeighbors(k: 1, classes: iNumberOfClasses, inputs: inputs, outputs: outputs, distance: Distance.Levenshtein); string[] ValuesClass = inputs.Where((x, idx) => outputs[idx] == knn.Compute("féerie")).ToArray(); ```

RSS feed for comments on this post. And trackBack URL.