Markov Chain Generator in .NET

As part of  my current IRC.NET project (an IRC client library for .NET 4.0), I decided to create as a sample project an IRC bot that implements a Markov text generator, one of the many applications of the Markov chain, a particularly concept in probability theory. I am going to assume here that you already know what a Markov chain is and have some idea of its potential applications.

Here is the relevant C# 3.0 source code from my sample project that contains all the functionality relating to Markov chains and Markov generation. A rather naive implementation, I would freely admit, but a simple and effective one, I’d like to think.

MarkovChain class

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace MarkovChainTextBox
{
    // Represents a Markov chain of arbitrary length.
    [DebuggerDisplay("{this.nodes.Count} nodes")]
    public class MarkovChain<T>
    {
        private static readonly IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

        private readonly Random random = new Random();

        private List<MarkovChainNode<T>> nodes;
        private ReadOnlyCollection<MarkovChainNode<T>> nodesReadOnly;

        public MarkovChain()
        {
            this.nodes = new List<MarkovChainNode<T>>();
            this.nodesReadOnly = new ReadOnlyCollection<MarkovChainNode<T>>(this.nodes);
        }

        public ReadOnlyCollection<MarkovChainNode<T>> Nodes
        {
            get { return nodesReadOnly; }
        }

        public IEnumerable<T> GenerateSequence()
        {
            var curNode = GetNode(default(T));
            while (true)
            {
                if (curNode.Links.Count == 0)
                    break;
                curNode = curNode.Links[random.Next(curNode.Links.Count)];
                if (curNode.Value == null)
                    break;
                yield return curNode.Value;
            }
        }

        public void Train(T fromValue, T toValue)
        {
            var fromNode = GetNode(fromValue);
            var toNode = GetNode(toValue);
            fromNode.AddLink(toNode);
        }

        private MarkovChainNode<T> GetNode(T value)
        {
            var node = this.nodes.SingleOrDefault(n => comparer.Equals(n.Value, value));
            if (node == null)
            {
                node = new MarkovChainNode<T>(value);
                this.nodes.Add(node);
            }
            return node;
        }
    }
}

MarkovChainNode class

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace MarkovChainTextBox
{
    // Represents a node within a Markov chain.
    [DebuggerDisplay("Value: {this.value == null ? \"(null)\" : this.value.ToString()}, {this.links.Count} links")]
    public class MarkovChainNode<T>
    {
        private T value;
        private List<MarkovChainNode<T>> links;
        private ReadOnlyCollection<MarkovChainNode<T>> linksReadOnly;

        public MarkovChainNode(T value)
            : this()
        {
            this.value = value;
        }

        public MarkovChainNode()
        {
            this.links = new List<MarkovChainNode<T>>();
            this.linksReadOnly = new ReadOnlyCollection<MarkovChainNode<T>>(this.links);
        }

        public T Value
        {
            get { return this.value; }
            set { this.value = value; }
        }

        public ReadOnlyCollection<MarkovChainNode<T>> Links
        {
            get { return linksReadOnly; }
        }

        public void AddLink(MarkovChainNode<T> toNode)
        {
            this.links.Add(toNode);
        }
    }
}

As usual, I am keen to hear any sort of feedback about what is a useful little piece of code. Undoubtedly, markov chains have a number of pretty interesting applications in science and the computer world, so it would be cool to hear if other people are using this for different purposes…

2 Responses to “Markov Chain Generator in .NET”

  1. David Laban says:

    Looks like a pretty simple implementation. I’d probably be tempted to keep links as a list of (count, node) but in your case count would probably be 1 most of the time and it would just complicate your sampling.

    I guess it depends how big you want your training data to be. In your case, with a 1st order markov model there’s no point in training it very far. Read the intro to Shannon’s paper for a simple exploration of other markov models.

  2. Noldorin says:

    Yeah, I also thought about maintaining a list of (count, node), but it would seem to complicate the implementation a fair bit, in the way you say. We’ll see, though – I might end up using that approach anyway for the sake of memory efficiency.

    If I ever get around to expanding my sample bot, I’ll definitely have a look at modifying the code to work with higher-order Markov models. Since I’m just trying to show off my IRC lib, and not really the wonders of Markov chains, that may not happen soon however.

RSS feed for comments on this post. And trackBack URL.

Leave a Reply


%d bloggers like this: