Thursday, June 25, 2009

Parsing MGraph

Ok – now you created your MGrammar, have a project to turn your users input from text to MGraph, now what?  You have a few options.  For my little toy project I want to take the users input and turn it into a graph of POCO objects that I can do something with.  When defining your grammar you will have the opportunity to define the projections or how you want your MGraph to look.  You can find a great article on MGraph at MSDN.

Intellipad is great when defining, testing and building the projections for your grammar.  Here is the relevant portion of the MGraph generated by the About My Pets grammar:

Sample MGraph Output Fragment, this would be considered one record within MGraph.

{
    PetInfo => {
      Name => "Razor",
      AnimalType => "Terrier",
      Age => "9",
      Sex => "female",
      PrimaryColor => "white",
      SecondaryColor => "brown"
    },
    Activities => {
      {
        Type => "jump",
        Minutes => null,
        Seconds => "12"
      },
      {
        Type => "bark",
        Minutes => "2",
        Seconds => null
      }
    },
    Meals => {
      {
        Cups => "1",
        MealTime => "breakfast"
      },
      {
        Cups => "2",
        MealTime => "dinner"
      }
    }
  }

This is nice to review but what can we do with it?

First off let’s generate some XAML, from our input.  I found this was a good way to understand what was going on under the hood as well as some of the terminology. 

 

Configuration to run M Command Line Tools

First off, we need to configure a command prompt to get it a little easier for you work with the Oslo command line tools.  To do this create a simple .CMD file somewhere and add the following text:

@set PATH=%PATH%;%programfiles%\Microsoft Oslo\1.0\bin

Then create a short cut to that file with the following information:

image

 

Click on that shortcut to open the command prompt, you should now have the Oslo tools in your path, to test this simply type “m

and you should see something like this:

image

Compiling our Grammar

Now back to our regularly schedule post.  First we need to compile our grammar so change the directory to where your file was saved and type:

C:\MyDirectory\m [MyGrammarFile].mg

When you do so, you should see the following:

image

This created a compiled version of your grammar with an MX extension, confirm this by looking at the contents of the directory.

image 

Generate some XAML for your User Input

Now we are ready to do something with our input file, so save a sample of user input into a text file in something like UserInput.txt

The sample I saved is similar to:

About my pets

Razor is a Terrier that is a
9 year old female, and her color is white. 
She will jump for 15 seconds and bark for 2 minutes. 
She eats 1 cup of food for breakfast and
eats 2 cups of food for dinner.

Rocket is a Rat Terrier that is a 6 year old male,
and his color is black.  He will whine for 3 minutes.
He eats 2 cups of food for dinner

Next run the following:

C:\MyDirectory\mgx /r:[MyGramamrFile].mx UserInput.txt /t:xaml

 

This will generate a XAML file from your sample user input or let you know any errors that may have occurred.

From my sample grammar, here is portion of the generated XAML.

image

 

Getting at this data in .NET

This is nice, and may be a little easier to parse, but what I really want to do is deserialize MGraph into a tree of POCO (plain old CLR objects) so I can do some “stuff” with the data.  I built a simple method to load the compiled grammar, and build an in-memory representation of the parsed user input with an instance of System.Dataflow.GraphBuilder from the Oslo runtime library.

Before you get started, you need to add references to your project:

using Microsoft.M;
using System.DataFlow;

Both can be found in the <Program Files>\Microsoft Oslo\1.0\bin directory

 

Create our Parser

First we need an class that knows how to format parsing errors, this will inherit from System.DataFlow.ErrorReporter.

using System;
using System.Dataflow;
using System.Linq;

namespace PetParser
{
    class ParserErrorReporter : ErrorReporter
    {
        protected override void OnError(ErrorInformation errorInformation)
        {
            string msg = string.Format(errorInformation.Message, errorInformation.Arguments.ToArray());

            throw new FormatException(
                string.Format("Syntax error at [{0}, {1}]: {2}",
                errorInformation.Location.Span.Start.Line,
                errorInformation.Location.Span.Start.Column,
                msg));
        }
    }
}

Parse the User Input

Next we’ll create a simple method to load up our compiled grammar, build a parser instance, perform the parse and hand the MGraph representation as an object graph of a set System.Dataflow.Node instance.

public void ParseGrammar(string grammar)
{
    //Load the grammar that was compiled into the assembly as a resource.
    using (var img = MImage.LoadFromResource(System.Reflection.Assembly.GetExecutingAssembly(), "PetParser.mx"))
    {
        //Load up the specific MGrammar file.
        var factory = img.ParserFactories["Pets.MyPets"];

        //Create our parser.
        var parser = factory.Create();

        //Create an instance of NodeGraphBuilder to let the
        //parser know what type of graph to build.
        parser.GraphBuilder = new NodeGraphBuilder();

        try
        {
            var grammarTextStream = new StringTextStream(grammar);
            //Attempt to parse the grammar, any errors will be
            //formatted via the ParserErrorReporter and handed
            //via Try/Catch
            var root = (Node)parser.Parse(grammarTextStream, new ParserErrorReporter());                   

            WalkTree(root);

        }
        catch (FormatException exc)
        {
            //If we get an error parsing the grammar, just write it (for now)
            LogIt(0, exc.Message);
        }
    }
}

Walk the System.Dataflow.Node tree

void WalkTree(Node node, int level)
{
    foreach (Edge recordEdge in node.Edges)
    {
        var value = string.Empty;
        if (recordEdge.Node != null && recordEdge.Node.AtomicValue != null)
            value = string.Format(@"Value: ""{0}""", recordEdge.Node.AtomicValue.ToString());

        LogIt(level, "{0}. [{2}] Brand Text: \"{3}\" Label Text: \"{4}\"  {5}", level, node.Brand.Text, node.NodeKind, node.Brand.Text, recordEdge.Label.Text, value);

        WalkTree(recordEdge.Node, ++level);
        --level;
    }
}

Output

Now when we execute this we get the following output

image

 

The following table is from the MSDN Page “MGraph Object Model”

image

Now that we are walking the tree, we just need to create instances of and populate our simple .NET types.  This could be done with some sort of  state machine or an implementation of the visitor pattern. 

I wouldn’t be very surprised if in upcoming releases of Oslo there will be built in mechanisms to create some sort of .NET code/assemblies and provide automatic serialization/deserialization using MSchema, but for now this isn’t too bad.

Next task is to build a front end to collect user input.  I’m thinking along the lines of an Azure hosted Silverlight application.  I’m thinking the grammar will be passed to the sever via WCF and all the actual parsing will happen there.

-ec

No comments:

Post a Comment