LINQ to YAML

LINQ to XML is one of the many technologies introduced with the .NET Framework 3.5, and one that is certainly a step forward in terms of usability. It allows querying in both the functional style (using LINQ and lambda expressions) and the more traditional imperative one, meaning that it’s a great tool for concisely working with XML data in any sort of application, and undoubtedly a significant improvement over the old XML DOM that resides in the System.Xml namespace.

In the spirit of LINQ, and with the advent of YAML, I recntly decided it was about time that this new “markup language” were integrated with LINQ. Surprisingly, there does not already exist anything akin to LINQ to YAML out there (though there are a couple of fairly usable implementations of a YAML reader/writer for .NET). This seemed to me like a good chance to potentially create something that might be used by more than the odd .NET developer or two. My plans are to implement a LINQ to YAML provider either from scratch or on top of one of the existing YAML libraries. (Which option I choose will depend on the state of the existing projects, which I haven’t yet investigated properly. I am however suspecting that it might be worthwhile writing my own, since it would a) teach me all the intricacies of YAML, and b) allow me to support the latest version [1.2], which the existing libraries do not.)

Before I launch into an overview of my intended implementation, here is a little bit about YAML itself, for those who aren’t already familiar with it. Although technically YAML isn’t a markup language (after all, the recursive acronym stands for YAML Ain’t Markup Language) – it is rather a serialisation format – it does essentially fulfill the the role that XML  traditionally has, in a variety of common situations. I’m not going to try to sell the format to you right now, but it should suffice to say that you wouldn’t have reached this far in the post if you weren’t already at least intrigued! Without doubt, the format is actively gaining popularity because of it’s ultra-lightweight syntax and suitability for hand editing, perhaps the two points that summarise its advantages over XML.

Anyway, here’s a short example of a YAML document (taken straight from the Wikipedia page), so you can see precisely how pleasant it is to work with (at least for humans).

receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy
    family:  Gale

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)
      price:     1.47
      quantity:  4

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      price:     100.27
      quantity:  1

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Westville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
    Pay no attention to the
    man behind the curtain.
...

Of course, the great thing about YAML, which is demonstrated clearly by this example, is that you don’t have to have any real knowledge about YAML to understand exactly and immediately what the data represents, and as a bonus it doesn’t hurt your eyes to stare at for too long! Even the referencing syntax should be fairly self evident. (Syntax such as &id00 and *id001 would surely be nothing new to C programmers.)

The semantics as well as the syntax of YAML obviously differ to those of XML greatly, although there is almost always some sort of correspondence between the features and possibilities that the two formats offer. The only notable missing feature when contrasted to XML is attributes, yet their usefulness is questionable anyway.

Right, so now I ought to explain a bit about how I actually plan to design this library. The basic framework will be virtually equivalent to that of LINQ to XML. In other words, the hierarchy will be largely based around an abstract YamlObject (YObject?) class, and will look very much like the one contained within System.Xml.Linq.

Diagram of LINQ to XML class hierarchyLINQ to XML class hierarchy

Though LINQ to YAML must of course accommodate for the unique nature of the format, I would initially aim for minimal difference and only significantly adjust the hierarchy when it is found to be necessary. Classes such as XCData and XDocumentType would not apply at all to YAML, yet there would need to be a place for a YReference or such somewhere in the hierarchy. The referencing aspect of YAML will likely prove to be one of the more interesting challenges; while YAML’s lists, maps (dictionaries), and combinations thereof would seem relatively straightforward with regards to emulation of the LINQ to XML design, references would introduce a substantially novel concept. Some sort of implementation of lazy evaluation followed by concrete referencing should be able to solve the problem, but there’s no way to predict how well this might work in practice at this moment.

What I realised only after deciding to create a LINQ to YAML library is that among LINQ providers, LINQ to XML is somewhat special in that the LINQ aspect of it is built on top of LINQ to Objects (i.e. LINQ using IEnumerable<T> objects), with only a relatively small number of extension methods specific to LINQ to XML. Indeed, most LINQ providers (LINQ to Objects and LINQ to SQL among others) require you to implement the IQueryable and IQueryProvider interfaces to provide complex logic for interpreting and returning the results of expressions, as well as evaluating complex expression trees.  All this means that I can pretty much just design a DOM  to a certain style (i.e. one suited to functional code, like LINQ to XML), and let LINQ to Objects to everything else for me.

As I can’t think of anything more worth mentioning about my project at this time, I shall leave any more specific and complex details to a future post. Still, do by all means feel free to query me about my plans – I would be glad to answer any questions, and even gladder to receive some suggestions as how you think I might design LINQ to YAML, or simply a nod that you might find this useful at some point. I don’t anticipate this project to be a very long one, though I must say that both my work and free-time schedule are likely to be fairly messed up for the next month or two, therefore I’m not going to promise when I’ll get around to my initial release. Whenever it so happens, I will duly post the link to the project page on Launchpad (or wherever I happen to host it).

3 Responses to “LINQ to YAML”

  1. matt kocaj says:

    So how far along with this are you? I’m in the market for a YAML provider that supports LINQ

  2. Hi.. Nice sounding project, How far have you got with this. Really interested in the support for yaml

  3. Gerard says:

    I first learned about YAML when I started working with Ruby on Rails. Ever since, I’m missing a YAML implementation in .NET that is as easy to work with as the one in Ruby. So, a LINQ to YAML implementation would be very cool!

RSS feed for comments on this post. And trackBack URL.

Leave a Reply